Instructions to use axiomlaborg/Cable with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use axiomlaborg/Cable with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="axiomlaborg/Cable")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("axiomlaborg/Cable", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use axiomlaborg/Cable with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "axiomlaborg/Cable" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "axiomlaborg/Cable", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/axiomlaborg/Cable
- SGLang
How to use axiomlaborg/Cable with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "axiomlaborg/Cable" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "axiomlaborg/Cable", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "axiomlaborg/Cable" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "axiomlaborg/Cable", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use axiomlaborg/Cable with Docker Model Runner:
docker model run hf.co/axiomlaborg/Cable
Context-aware Biases for Length Extrapolation
The source code of (Context-aware Biases for Length Extrapolation)
🚀 News
- [2025.02.3] Code release
Upcoming
- Cleaning codebase
- Adding scripts for training ALiBi, RoPE, T5-bias
Datasets and Models
Download the datasets from HuggingFace and use use dataset_preparation.py for saving tokenized dataset.
Some of trained models:
| Dataset | Model | Parameters | Sequence Length | Checkpoint |
|---|---|---|---|---|
| Fineweb-Edu(10B) | GPT-Medium | 334M | 1024 | |
| Fineweb-Edu(10B) | GPT-Medium | 334M | 512 | |
| WikiText-103 | GPT-Tiny | 44M | 1024 | |
| WikiText-103 | GPT-Tiny | 44M | 512 |
how to use models:
from transformers import AutoModel
cable_fineweb_md_1024 = AutoModel.from_pretrained("axiomlaborg/Cable", trust_remote_code=True, revision = "cable-edufineweb-md-1024")
cable_fineweb_md_512 = AutoModel.from_pretrained("axiomlaborg/Cable", trust_remote_code=True, revision = "cable-edufineweb-md-512")
cable_wiki_tiny_1024 = AutoModel.from_pretrained("axiomlaborg/Cable", trust_remote_code=True, revision = "cable-wiki-tiny-1024")
cable_wiki_tiny_512 = AutoModel.from_pretrained("axiomlaborg/Cable", trust_remote_code=True, revision = "cable-wiki-tiny-512")
Training
Single GPU
python Cable.py --dataset-dir "path to dataset" --model "medium or small or tiny" --save-dir "dir for logs"Multiple GPUs
torchrun --standalone --nproc_per_node=2 Cable.py
For Hellaswag benchmark and evaluating extrapolation please use evaluation.ipynb notebook.
Length Extrapolation
A Cable model trained on T=1024 can extrapolate on T=8192, achieving a better performance (PPL=22.22) compared to the sinusoidal model (PPL=22.81) trained on T=8192.
Runtime and Memory Overhead
Cable improves the model's extrapolation ability significantly with a negligible burden in time and memory compared to the vanilla transformer. Furthermore, compared to existing RPE methods, our approach maintains nearly identical training time and GPU memory usage, while its inference overhead remains either negligible or comparable, depending on the sequence length.
Citation
If you use this repository for your research or wish to refer to our positional encoding method, please use the following BibTeX entry:
@article{veisi2025context,
title={Context-aware Biases for Length Extrapolation},
author={Ali Veisi and Amir Mansourian},
journal={arXiv preprint arXiv:2503.08067},
year={2025}
}
Acknowledgement
This repo is based on Karpathy/Build-NanoGPT. Thanks for their excellent work.
Model tree for axiomlaborg/Cable
Base model
openai-community/gpt2