pinned
Running
171
GIFT Eval
🥇
GIFT-Eval: A Benchmark for General Time Series Forecasting
None defined yet.
Future Optical Flow Prediction Improves Robot Control & Video Generation
LoCoBench-Agent: An Interactive Benchmark for LLM Agents in Long-Context Software Engineering
GIFT-Eval: A Benchmark for General Time Series Forecasting
A realistic benchmark with real CRM tasks for LLM agents.
View and submit LLM benchmark evaluations
Filter and view LLM benchmark data
Explore efficient reasoning techniques with large language models
Caption and chat about images