Ternary LLMs & Knowledge distillation & SOTA - a RachidAR Collection

RachidAR 's Collections

Fine-Tuned or Trained models

Ternary LLMs & Knowledge distillation & SOTA

SOTA architecture

Ternary LLMs & Knowledge distillation & SOTA

updated Feb 18, 2025

Addition is All You Need for Energy-efficient Language Models

Paper • 2410.00907 • Published Oct 1, 2024 • 151
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Paper • 2402.17764 • Published Feb 27, 2024 • 627
LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding

Paper • 2404.16710 • Published Apr 25, 2024 • 80
Beyond Scaling Laws: Understanding Transformer Performance with Associative Memory

Paper • 2405.08707 • Published May 14, 2024 • 34
Token-Scaled Logit Distillation for Ternary Weight Generative Language Models

Paper • 2308.06744 • Published Aug 13, 2023 • 1
TerDiT: Ternary Diffusion Models with Transformers

Paper • 2405.14854 • Published May 23, 2024 • 2
Reducing Transformer Key-Value Cache Size with Cross-Layer Attention

Paper • 2405.12981 • Published May 21, 2024 • 33
You Only Cache Once: Decoder-Decoder Architectures for Language Models

Paper • 2405.05254 • Published May 8, 2024 • 10
Differential Transformer

Paper • 2410.05258 • Published Oct 7, 2024 • 180
BitNet a4.8: 4-bit Activations for 1-bit LLMs

Paper • 2411.04965 • Published Nov 7, 2024 • 69
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention

Paper • 2502.11089 • Published Feb 16, 2025 • 168