Running on L40S 61 NuMarkdown 8b Thinking 👁 61 Reasoning model specialized for OCR/Markdown generation.
Rubrics as Rewards: Reinforcement Learning Beyond Verifiable Domains Paper • 2507.17746 • Published Jul 23, 2025 • 4