Model created for the paper "Preferences for Idiomatic Language are Acquired Slowly --- and Forgotten Quickly: A Case Study on Swedish", TACL 2026.

Citation

@misc{kunz2026preferencesidiomaticlanguageacquired,
      title={Preferences for Idiomatic Language are Acquired Slowly -- and Forgotten Quickly: A Case Study on Swedish}, 
      author={Jenny Kunz},
      year={2026},
      eprint={2602.03484},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2602.03484}, 
}

Training:

This is a SmolLM2-135M model continually pre-trained on the Swedish portion of Fineweb-2.

  • 1 Epoch
  • Learning rate: 5e-4
  • LR scheduler: Cosine
  • Warmup ratio: 0.05
  • Batch size: 1
  • 4 A100 (40GB) GPUs
  • Gradient accumulation steps: 64
  • Effective batch size: 256
  • Max. context length: 8192 tokens

Limitations

This is a research model intended for studying pre-training dynamics and I do not recommend using it for any practical purposes. It is trained on a web corpus, and no alignment whatsoever has been performed, which means that the model will likely reflect its training data's biases and produce lots of hallucinations.

Downloads last month
13
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for jekunz/smollm-135m-cpt-fineweb-swedish

Finetuned
(300)
this model
Finetunes
2 models
Merges
2 models

Dataset used to train jekunz/smollm-135m-cpt-fineweb-swedish

Collections including jekunz/smollm-135m-cpt-fineweb-swedish

Paper for jekunz/smollm-135m-cpt-fineweb-swedish