InnoEval: On Research Idea Evaluation as a Knowledge-Grounded, Multi-Perspective Reasoning Problem Paper • 2602.14367 • Published Feb 16 • 17
AIRS-Bench: a Suite of Tasks for Frontier AI Research Science Agents Paper • 2602.06855 • Published Feb 6 • 83
Sequential Causal Normal Form Games: Theory, Computation, and Strategic Signaling Paper • 2511.06934 • Published Nov 10, 2025
Causal Regime Detection in Energy Markets With Augmented Time Series Structural Causal Models Paper • 2511.04361 • Published Nov 6, 2025
How the Misuse of a Dataset Harmed Semantic Clone Detection Paper • 2505.04311 • Published May 7, 2025
Sentinel: A Hyper-Heuristic for the Generation of Mutant Reduction Strategies Paper • 2103.07241 • Published Mar 12, 2021
REASONING GYM: Reasoning Environments for Reinforcement Learning with Verifiable Rewards Paper • 2505.24760 • Published May 30, 2025 • 74
Judging the Judges: A Collection of LLM-Generated Relevance Judgements Paper • 2502.13908 • Published Feb 19, 2025 • 5
BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions Paper • 2406.15877 • Published Jun 22, 2024 • 50
Towards JointUD: Part-of-speech Tagging and Lemmatization using Recurrent Neural Networks Paper • 1809.03211 • Published Sep 10, 2018
Natural Language Inference over Interaction Space: ICLR 2018 Reproducibility Report Paper • 1802.03198 • Published Feb 9, 2018 • 1
Technical Report on the CleverHans v2.1.0 Adversarial Examples Library Paper • 1610.00768 • Published Oct 3, 2016