view article Article Shipping a Trillion Parameters With a Hub Bucket: Delta Weight Sync in TRL +5 aminediroHF, qgallouedec, kashif, lewtun, edbeeching, albertvillanova, lvwerra • 1 day ago • 11
RFDetr Collection RF-DETR checkpoints converted to be used with 🤗 Transformers • 15 items • Updated about 18 hours ago • 11
🧬 Carbon Collection Carbon 500M, 3B, 8B genomic models and GGUF variants for llama.cpp • 6 items • Updated 7 days ago • 36
view article Article KV Caching Explained: Optimizing Transformer Inference Efficiency not-lain • Jan 30, 2025 • 340
view article Article Unlocking asynchronicity in continuous batching +1 ror, pcuenq, ariG23498 • 14 days ago • 55
view article Article Welcome Gemma 4: Frontier multimodal intelligence on device +5 merve, pcuenq, sergiopaniego, burtenshaw, Steveeeeeeen, alvarobartt, SaylorTwift • Apr 2 • 901
view article Article TRL v1.0: Post-Training Library Built to Move with the Field +2 qgallouedec, stevhliu, pcuenq, sergiopaniego • Mar 31 • 53
view article Article SynthVision: Building a 110K Synthetic Medical VQA Dataset with Cross-Model Validation OpenMed • Mar 23 • 17
Strategic Navigation or Stochastic Search? How Agents and Humans Reason Over Document Collections Paper • 2603.12180 • Published Mar 12 • 65
view article Article Ulysses Sequence Parallelism: Training with Million-Token Contexts kashif, stas • Mar 9 • 30
view article Article Introducing Storage Buckets on the Hugging Face Hub +10 Wauplin, coyotte508, XciD, victor, julien-c, lhoestq, pierric, Sylvestre, hlarcher, rajatarya, seanses, assafvayner • Mar 10 • 195
view article Article Keep the Tokens Flowing: Lessons from 16 Open-Source RL Libraries +7 aminediroHF, qgallouedec, kashif, lewtun, edbeeching, albertvillanova, nouamanetazi, lvwerra, sergiopaniego • Mar 10 • 157
view article Article Mixture of Experts (MoEs) in Transformers +5 ariG23498, pcuenq, merve, IlyasMoutawwakil, ArthurZ, sergiopaniego, Molbap • Feb 26 • 164