--- language: - en tags: - pytorch - causal-lm - pythia - polypythias - gpt-neox license: apache-2.0 datasets: - EleutherAI/pile - EleutherAI/pile-preshuffled-seeds --- # Pythia-410M-seed8 GPT-NeoX Checkpoints This repository contains the raw [GPT-NeoX](https://github.com/EleutherAI/gpt-neox) training checkpoints for [Pythia-410M-seed8](https://huggingface.co/EleutherAI/pythia-410m-seed8), part of the [PolyPythias](https://huggingface.co/collections/EleutherAI/polypythias) suite. These are the native checkpoint files produced during training, stored in DeepSpeed's checkpoint format. **If you want to perform inference**, use the HuggingFace Transformers-compatible weights at [`EleutherAI/pythia-410m-seed8`](https://huggingface.co/EleutherAI/pythia-410m-seed8) instead. This repository is intended for research that requires access to optimizer states or the original training format. ## Contents Each branch contains a full training checkpoint at a given step, including: - `layer_XX-model_00-model_states.pt` — model weight shards (one per layer) - `mp_rank_00_model_states.pt` — model state metadata - `zero_pp_rank_*_optim_states.pt` — ZeRO optimizer states (Adam moments, etc.) - `410M.yml` — GPT-NeoX training configuration ## Branches 154 checkpoints are available as branches: - `step0` — initialization - `step{1,2,4,8,16,32,64,128,256,512}` — log-spaced early checkpoints - `step1000` through `step143000` — every 1,000 steps Branch `step143000` corresponds to the final model. ## Converting to HuggingFace Format To convert a checkpoint to HuggingFace Transformers format, use the conversion script from [GPT-NeoX](https://github.com/EleutherAI/gpt-neox): ```bash python tools/convert_neox_to_hf.py \ --input_dir /path/to/neox/checkpoint \ --config_file /path/to/config.yml \ --output_dir /path/to/hf/output ``` Pre-converted weights for all checkpoints are available at [`EleutherAI/pythia-410m-seed8`](https://huggingface.co/EleutherAI/pythia-410m-seed8). ## Training Details Trained on [the Pile](https://pile.eleuther.ai/) using a pre-shuffled data ordering specific to this seed. The shuffled index files are available at [`EleutherAI/pile-preshuffled-seeds`](https://huggingface.co/datasets/EleutherAI/pile-preshuffled-seeds). All PolyPythias models were trained for 143,000 steps with a batch size of 2M tokens (2,097,152 tokens per step), seeing a total of 299,892,736,000 tokens. See the [PolyPythias paper](https://arxiv.org/abs/2503.09543) and [Pythia GitHub repository](https://github.com/EleutherAI/pythia) for full training details.
| Model Size | Parameters | Layers | Model Dim | Heads | Original Model | | ---------: | ---------: | :----: | :-------: | :---: | :------------: | | 14M | 14M | 6 | 128 | 4 | [pythia-14m](https://huggingface.co/EleutherAI/pythia-14m) | | 31M | 31M | 6 | 256 | 8 | [pythia-31m](https://huggingface.co/EleutherAI/pythia-31m) | | 70M | 70M | 6 | 512 | 8 | [pythia-70m](https://huggingface.co/EleutherAI/pythia-70m) | | 160M | 160M | 12 | 768 | 12 | [pythia-160m](https://huggingface.co/EleutherAI/pythia-160m) | | 410M | 410M | 24 | 1024 | 16 | [pythia-410m](https://huggingface.co/EleutherAI/pythia-410m) |
## About PolyPythias PolyPythias is an extension of the Pythia project providing 45 additional training runs across 5 model sizes with 9 different random seeds each. These models enable systematic study of training stability and reproducibility in language models. The 160M size also includes decoupled variants (`data-seed` and `weight-seed`) that isolate the effects of data ordering vs. weight initialization. The complete collection is available at: [EleutherAI/polypythias](https://huggingface.co/collections/EleutherAI/polypythias) ## Citation ```bibtex @inproceedings{vanderwal2025polypythias, title={PolyPythias: Stability and Outliers across Fifty Language Model Pre-Training Runs}, author={van der Wal, Oskar and Lesci, Pietro and Muller-Eberstein, Max and Saphra, Naomi and Schoelkopf, Hailey and Zuidema, Willem and Biderman, Stella}, booktitle={International Conference on Learning Representations}, year={2025}, url={https://arxiv.org/abs/2503.09543} } ```