AHELM: A Holistic Evaluation of Audio-Language Models Paper • 2508.21376 • Published Aug 29, 2025 • 9
When Visualizing is the First Step to Reasoning: MIRA, a Benchmark for Visual Chain-of-Thought Paper • 2511.02779 • Published Nov 4, 2025 • 60
Where on Earth? A Vision-Language Benchmark for Probing Model Geolocation Skills Across Scales Paper • 2510.10880 • Published Oct 13, 2025
How Many Unicorns Are in This Image? A Safety Evaluation Benchmark for Vision LLMs Paper • 2311.16101 • Published Nov 27, 2023 • 1
AttnGCG: Enhancing Jailbreaking Attacks on LLMs with Attention Manipulation Paper • 2410.09040 • Published Oct 11, 2024
Your Agent, Their Asset: A Real-World Safety Analysis of OpenClaw Paper • 2604.04759 • Published 9 days ago • 22
Your Agent, Their Asset: A Real-World Safety Analysis of OpenClaw Paper • 2604.04759 • Published 9 days ago • 22
MetaClaw: Just Talk -- An Agent That Meta-Learns and Evolves in the Wild Paper • 2603.17187 • Published 28 days ago • 138
When Visualizing is the First Step to Reasoning: MIRA, a Benchmark for Visual Chain-of-Thought Paper • 2511.02779 • Published Nov 4, 2025 • 60
Olivia714/qwen7b-distill-thinkflag1-all_10_or_1_9_plus_2_9_5k-epoch0 Text Generation • 8B • Updated Oct 9, 2025 • 1
Olivia714/llama8b-distill-thinkflag1-all_10_or_1_9_plus_2_9_5k-epoch0 Text Generation • 8B • Updated Oct 9, 2025 • 1
Olivia714/qwen7b-distill-thinkflag1-all_10_or_1_9_plus_2_9_5k-epoch0 Text Generation • 8B • Updated Oct 9, 2025 • 1
Olivia714/llama8b-distill-thinkflag1-all_10_or_1_9_plus_2_9_5k-epoch0 Text Generation • 8B • Updated Oct 9, 2025 • 1
AHELM: A Holistic Evaluation of Audio-Language Models Paper • 2508.21376 • Published Aug 29, 2025 • 9