CompactAttention: Accelerating Chunked Prefill with Block-Union KV Selection Paper • 2605.16839 • Published 22 days ago • 13
CompactAttention: Accelerating Chunked Prefill with Block-Union KV Selection Paper • 2605.16839 • Published 22 days ago • 13
CompactAttention: Accelerating Chunked Prefill with Block-Union KV Selection Paper • 2605.16839 • Published 22 days ago • 13
LK Losses: Direct Acceptance Rate Optimization for Speculative Decoding Paper • 2602.23881 • Published Feb 27 • 18
RelayGen: Intra-Generation Model Switching for Efficient Reasoning Paper • 2602.06454 • Published Feb 6 • 12
RelayGen: Intra-Generation Model Switching for Efficient Reasoning Paper • 2602.06454 • Published Feb 6 • 12
RelayGen: Intra-Generation Model Switching for Efficient Reasoning Paper • 2602.06454 • Published Feb 6 • 12
Token Sparse Attention: Efficient Long-Context Inference with Interleaved Token Selection Paper • 2602.03216 • Published Feb 3 • 13
L4Q: Parameter Efficient Quantization-Aware Training on Large Language Models via LoRA-wise LSQ Paper • 2402.04902 • Published Feb 7, 2024 • 5
LRAgent: Efficient KV Cache Sharing for Multi-LoRA LLM Agents Paper • 2602.01053 • Published Feb 1 • 8