CMU-LTI

university

Activity Feed Request to join this org

AI & ML interests

None defined yet.

Recent Activity

seungone submitted a paper about 5 hours ago

K-BrowseComp: A Web Browsing Agent Benchmark Grounded in Korean Contexts

seungone authored a paper 12 days ago

On the limits and opportunities of AI reviewers: Reviewing the reviews of Nature-family papers with 45 expert scientists

seungone submitted a paper 12 days ago

On the limits and opportunities of AI reviewers: Reviewing the reviews of Nature-family papers with 45 expert scientists

View all activity

Papers

Benchmark Test-Time Scaling of General LLM Agents

On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models

View all Papers

submitted a paper to Daily Papers about 5 hours ago

K-BrowseComp: A Web Browsing Agent Benchmark Grounded in Korean Contexts

Paper • 2606.02404 • Published 1 day ago • 37

authored a paper 12 days ago

On the limits and opportunities of AI reviewers: Reviewing the reviews of Nature-family papers with 45 expert scientists

Paper • 2605.20668 • Published 13 days ago • 12

submitted a paper to Daily Papers 12 days ago

On the limits and opportunities of AI reviewers: Reviewing the reviews of Nature-family papers with 45 expert scientists

Paper • 2605.20668 • Published 13 days ago • 12

authored 2 papers 20 days ago

Reasoning over mathematical objects: on-policy reward modeling and test time aggregation

Paper • 2603.18886 • Published Mar 19 • 6

Soohak: A Mathematician-Curated Benchmark for Evaluating Research-level Math Capabilities of LLMs

Paper • 2605.09063 • Published 24 days ago • 80

submitted a paper to Daily Papers about 1 month ago

Building a Precise Video Language with Human-AI Oversight

Paper • 2604.21718 • Published Apr 22 • 17

authored 7 papers about 2 months ago

T-Eval: Evaluating the Tool Utilization Capability Step by Step

Paper • 2312.14033 • Published Dec 21, 2023 • 2

Building Cooperative Embodied Agents Modularly with Large Language Models

Paper • 2307.02485 • Published Jul 5, 2023 • 12

HAZARD Challenge: Embodied Decision Making in Dynamically Changing Environments

Paper • 2401.12975 • Published Jan 23, 2024

Agentic-R1: Distilled Dual-Strategy Reasoning

Paper • 2507.05707 • Published Jul 8, 2025

Scaling LLM Multi-turn RL with End-to-end Summarization-based Context Management

Paper • 2510.06727 • Published Oct 8, 2025 • 5

Training Proactive and Personalized LLM Agents

Paper • 2511.02208 • Published Nov 4, 2025

Mind the Sim2Real Gap in User Simulation for Agentic Tasks

Paper • 2603.11245 • Published Mar 11

updated a dataset about 2 months ago

cmu-lti/tau-usi

Updated Apr 6 • 110 • 3

published a dataset about 2 months ago

cmu-lti/tau-usi

Updated Apr 6 • 110 • 3

updated a dataset 3 months ago

cmu-lti/machine-translation-for-vision

Viewer • Updated Mar 3 • 696 • 189 • 1

lixiaochuan2020

submitted a paper to Daily Papers 3 months ago

Benchmark Test-Time Scaling of General LLM Agents

Paper • 2602.18998 • Published Feb 22 • 9

authored a paper 3 months ago

Towards Reducible Uncertainty Modeling for Reliable Large Language Model Agents

Paper • 2602.05073 • Published Feb 4 • 11

published a Space 4 months ago

MachineTranslationforVision

Explore competition details and submit entries

updated a Space 4 months ago

MachineTranslationforVision

Explore competition details and submit entries