Instructions to use josephmayo/ZAYA1-8B-Coder with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use josephmayo/ZAYA1-8B-Coder with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="josephmayo/ZAYA1-8B-Coder")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("josephmayo/ZAYA1-8B-Coder", dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use josephmayo/ZAYA1-8B-Coder with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "josephmayo/ZAYA1-8B-Coder"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "josephmayo/ZAYA1-8B-Coder",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/josephmayo/ZAYA1-8B-Coder

SGLang

How to use josephmayo/ZAYA1-8B-Coder with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "josephmayo/ZAYA1-8B-Coder" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "josephmayo/ZAYA1-8B-Coder",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "josephmayo/ZAYA1-8B-Coder" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "josephmayo/ZAYA1-8B-Coder",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use josephmayo/ZAYA1-8B-Coder with Docker Model Runner:
```
docker model run hf.co/josephmayo/ZAYA1-8B-Coder
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

ZAYA1-8B Coder

Merged Coder model from Zyphra/ZAYA1-8B and josephmayo/ZAYA1-8B-Coder-LoRA. This repo contains the adapter merged into the base weights as normal safetensors shards.

Evaluation Gate

The adapter was evaluated against the base model on 50 Python code-generation prompts with a 0-10 heuristic score:

Base average: 2.36 / 10
LoRA average: 4.76 / 10
Absolute score delta: +2.40 / 10
Full-scale lift: 24.00%
Relative lift over base average: 101.69%
Improved prompts: 39 / 50
Merge threshold: 20.00%
Merge decision: true

Full-scale lift is the required notebook metric:

((lora_avg - base_avg) / 10) * 100
((4.76 - 2.36) / 10) * 100 = 24.00%

Scoring Heuristic

Each response was scored out of 10:

def present: 2 points
class present: 1 point
return present: 1 point
import or from present: 1 point
fenced code block present: 1 point
output length greater than 100 characters: 1 point
Python AST parse validity: 3 points

Architecture Notes

ZAYA uses a custom model_type = zaya; it is not weight-compatible with LlamaForCausalLM despite similar naming in some configs. During evaluation and merge, the real ZAYA architecture was loaded using Zyphra's Transformers implementation:

pip install git+https://github.com/Zyphra/transformers.git@zaya1

The LoRA adapter contains 160 tensors targeting:

self_attn.o_proj
zaya_block.router.down_proj

The merge was performed after the evaluation gate passed, then the merged model was saved to safetensors shards with tokenizer and generation config.

Evaluation artifacts are included under eval/:

eval/eval_summary.json
eval/score_table.csv
eval/base_outputs.jsonl
eval/lora_outputs.jsonl

Included Files

model-00001-of-00005.safetensors through model-00005-of-00005.safetensors
model.safetensors.index.json
config.json
generation_config.json
tokenizer files
zaya_patched_config.json
evaluation outputs under eval/

The GGUF quantized release is available at josephmayo/ZAYA1-8B-Coder-GGUF.

Downloads last month: 39

Safetensors

Model size

9B params

Tensor type

F32

F16

Model tree for josephmayo/ZAYA1-8B-Coder

Base model

Zyphra/ZAYA1-base

Finetuned

Zyphra/ZAYA1-reasoning-base

Finetuned

Zyphra/ZAYA1-8B

Adapter

(3)

this model

Quantizations

1 model

Collection including josephmayo/ZAYA1-8B-Coder

SLMs / ELMs & fine tuned

Collection

these are models fine tuned to be used for specific usecase - mainly coding & agentic tasks • 13 items • Updated 6 days ago