pinned
Running
Agents
1
SLR-Bench Leaderboard - Reward Hacking in Reasoning Models
π―
Reward shortcut behavior in LLMs via IPT
None defined yet.
Reward shortcut behavior in LLMs via IPT
Compare regular and safe versions of generated images
View and rank time series forecasting submissions
Evaluate logical rules for genuine vs shortcut
Generate safety assessments for images
Evaluate logical rules with a validation program
Explore how Stable Diffusion and Fair Diffusion represent different professions