---
language:
- en
tags:
- pytorch
- causal-lm
- pythia
- polypythias
- gpt-neox
license: apache-2.0
datasets:
- EleutherAI/pile
- EleutherAI/pile-preshuffled-seeds
---

# Pythia-410M-seed8 GPT-NeoX Checkpoints

This repository contains the raw [GPT-NeoX](https://github.com/EleutherAI/gpt-neox) training checkpoints for [Pythia-410M-seed8](https://huggingface.co/EleutherAI/pythia-410m-seed8), part of the [PolyPythias](https://huggingface.co/collections/EleutherAI/polypythias) suite. These are the native checkpoint files produced during training, stored in DeepSpeed's checkpoint format.

**If you want to perform inference**, use the HuggingFace Transformers-compatible weights at [`EleutherAI/pythia-410m-seed8`](https://huggingface.co/EleutherAI/pythia-410m-seed8) instead. This repository is intended for research that requires access to optimizer states or the original training format.

## Contents

Each branch contains a full training checkpoint at a given step, including:

- `layer_XX-model_00-model_states.pt` — model weight shards (one per layer)
- `mp_rank_00_model_states.pt` — model state metadata
- `zero_pp_rank_*_optim_states.pt` — ZeRO optimizer states (Adam moments, etc.)
- `410M.yml` — GPT-NeoX training configuration

## Branches

154 checkpoints are available as branches:

- `step0` — initialization
- `step{1,2,4,8,16,32,64,128,256,512}` — log-spaced early checkpoints
- `step1000` through `step143000` — every 1,000 steps

Branch `step143000` corresponds to the final model.

## Converting to HuggingFace Format

To convert a checkpoint to HuggingFace Transformers format, use the conversion script from [GPT-NeoX](https://github.com/EleutherAI/gpt-neox):

```bash
python tools/convert_neox_to_hf.py \
    --input_dir /path/to/neox/checkpoint \
    --config_file /path/to/config.yml \
    --output_dir /path/to/hf/output
```

Pre-converted weights for all checkpoints are available at [`EleutherAI/pythia-410m-seed8`](https://huggingface.co/EleutherAI/pythia-410m-seed8).

## Training Details

Trained on [the Pile](https://pile.eleuther.ai/) using a pre-shuffled data ordering specific to this seed. The shuffled index files are available at [`EleutherAI/pile-preshuffled-seeds`](https://huggingface.co/datasets/EleutherAI/pile-preshuffled-seeds).

All PolyPythias models were trained for 143,000 steps with a batch size of 2M tokens (2,097,152 tokens per step), seeing a total of 299,892,736,000 tokens. See the [PolyPythias paper](https://arxiv.org/abs/2503.09543) and [Pythia GitHub repository](https://github.com/EleutherAI/pythia) for full training details.

<figure>

| Model Size | Parameters | Layers | Model Dim | Heads | Original Model |
| ---------: | ---------: | :----: | :-------: | :---: | :------------: |
| 14M        | 14M        | 6      | 128       | 4     | [pythia-14m](https://huggingface.co/EleutherAI/pythia-14m) |
| 31M        | 31M        | 6      | 256       | 8     | [pythia-31m](https://huggingface.co/EleutherAI/pythia-31m) |
| 70M        | 70M        | 6      | 512       | 8     | [pythia-70m](https://huggingface.co/EleutherAI/pythia-70m) |
| 160M       | 160M       | 12     | 768       | 12    | [pythia-160m](https://huggingface.co/EleutherAI/pythia-160m) |
| 410M       | 410M       | 24     | 1024      | 16    | [pythia-410m](https://huggingface.co/EleutherAI/pythia-410m) |

</figure>

## About PolyPythias

PolyPythias is an extension of the Pythia project providing 45 additional training runs across 5 model sizes with 9 different random seeds each. These models enable systematic study of training stability and reproducibility in language models. The 160M size also includes decoupled variants (`data-seed` and `weight-seed`) that isolate the effects of data ordering vs. weight initialization.

The complete collection is available at: [EleutherAI/polypythias](https://huggingface.co/collections/EleutherAI/polypythias)

## Citation

```bibtex
@inproceedings{vanderwal2025polypythias,
    title={PolyPythias: Stability and Outliers across Fifty Language Model Pre-Training Runs},
    author={van der Wal, Oskar and Lesci, Pietro and Muller-Eberstein, Max and Saphra, Naomi and Schoelkopf, Hailey and Zuidema, Willem and Biderman, Stella},
    booktitle={International Conference on Learning Representations},
    year={2025},
    url={https://arxiv.org/abs/2503.09543}
}
```