Long Context Pre-Training with Lighthouse Attention Paper • 2605.06554 • Published 10 days ago • 19
Efficient Pre-Training with Token Superposition Paper • 2605.06546 • Published 10 days ago • 36