Kite

🎉 You are looking at Kite 3.1, which is now using a more correct head dimension size!

Kite is a small, trained, 20 million parameter language model, without any special optimizations.

Training

It was trained on 50K rows of FineWeb Edu using 1 epoch, 4 batch size, 1.5e-4 learning rate, and the pika 3 tokenizer.

Due to its size, the model is not suitable for production workloads.

Safetensors

Model size

19.3M params

Tensor type

F32