bitmamba-zen-distillix-300m

This is a 1.58-bit BitMamba (Zen) model trained via Superposition Distillation.

Model Details

  • Teacher: Qwen/Qwen2.5-Coder-1.5B-Instruct
  • Architecture: Mamba SSM (Zen Variant) + BitLinear (1.58-bit weights)
  • Params: ~278M
  • Training Data: CodeAlpaca (Distilled)

Usage

This model requires the custom BitMambaStudent class definition to run. It was trained as a Proof-of-Concept for Superposition Distillation.

Downloads last month
16
GGUF
Model size
0.3B params
Architecture
mamba
Hardware compatibility
Log In to add your hardware

5-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support