Instructions to use arcee-ai/Trinity-Nano-Preview-MLX-8bit with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use arcee-ai/Trinity-Nano-Preview-MLX-8bit with MLX:

# Make sure mlx-lm is installed
# pip install --upgrade mlx-lm

# Generate text with mlx-lm
from mlx_lm import load, generate

model, tokenizer = load("arcee-ai/Trinity-Nano-Preview-MLX-8bit")

prompt = "Write a story about Einstein"
messages = [{"role": "user", "content": prompt}]
prompt = tokenizer.apply_chat_template(
    messages, add_generation_prompt=True
)

text = generate(model, tokenizer, prompt=prompt, verbose=True)

Notebooks
Google Colab
Kaggle
Local Apps
LM Studio

Pi new

How to use arcee-ai/Trinity-Nano-Preview-MLX-8bit with Pi:

Start the MLX server

# Install MLX LM:
uv tool install mlx-lm
# Start a local OpenAI-compatible server:
mlx_lm.server --model "arcee-ai/Trinity-Nano-Preview-MLX-8bit"

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "mlx-lm": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "arcee-ai/Trinity-Nano-Preview-MLX-8bit"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use arcee-ai/Trinity-Nano-Preview-MLX-8bit with Hermes Agent:

Start the MLX server

# Install MLX LM:
uv tool install mlx-lm
# Start a local OpenAI-compatible server:
mlx_lm.server --model "arcee-ai/Trinity-Nano-Preview-MLX-8bit"

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default arcee-ai/Trinity-Nano-Preview-MLX-8bit

Run Hermes

hermes

MLX LM

How to use arcee-ai/Trinity-Nano-Preview-MLX-8bit with MLX LM:

Generate or start a chat session

# Install MLX LM
uv tool install mlx-lm
# Interactive chat REPL
mlx_lm.chat --model "arcee-ai/Trinity-Nano-Preview-MLX-8bit"

Run an OpenAI-compatible server

# Install MLX LM
uv tool install mlx-lm
# Start the server
mlx_lm.server --model "arcee-ai/Trinity-Nano-Preview-MLX-8bit"
# Calling the OpenAI-compatible server with curl
curl -X POST "http://localhost:8000/v1/chat/completions" \
   -H "Content-Type: application/json" \
   --data '{
     "model": "arcee-ai/Trinity-Nano-Preview-MLX-8bit",
     "messages": [
       {"role": "user", "content": "Hello"}
     ]
   }'

Trinity Nano MLX 8bit

Trinity Nano Preview is a preview of Arcee AI's 6B MoE model with 1B active parameters. It is the small-sized model in our new Trinity family, a series of open-weight models for enterprise and tinkerers alike.

This is a chat tuned model, with a delightful personality and charm we think users will love. We note that this model is pushing the limits of sparsity in small language models with only 800M non-embedding parameters active per token, and as such may be unstable in certain use cases, especially in this preview.

This is an experimental release, it's fun to talk to but will not be hosted anywhere, so download it and try it out yourself!

Trinity Nano Preview is trained on 10T tokens gathered and curated through a key partnership with Datology, building upon the excellent dataset we used on AFM-4.5B with additional math and code.

Training was performed on a cluster of 512 H200 GPUs powered by Prime Intellect using HSDP parallelism.

More details, including key architecture decisions, can be found on our blog here

Model Details

Model Architecture: AfmoeForCausalLM
Parameters: 6B, 1B active
Experts: 128 total, 8 active, 1 shared
Context length: 128k
Training Tokens: 10T
License: Apache 2.0

Use with mlx

pip install mlx-lm

from mlx_lm import load, generate
from mlx_lm.sample_utils import make_sampler, make_logits_processors

model, tokenizer = load("arcee-ai/Trinity-Nano-Preview-MLX-8bit")

prompt = "What is the capital of France?"

if tokenizer.chat_template is not None:
    messages = [{"role": "user", "content": prompt}]
    prompt = tokenizer.apply_chat_template(
        messages, tokenize=False, add_generation_prompt=True
    )

sampler = make_sampler(temp=0.1, top_k=50, top_p=0.1)
logits_processors = make_logits_processors(repetition_penalty=1.05)

response = generate(
    model,
    tokenizer,
    prompt=prompt,
    max_tokens=512,
    sampler=sampler,
    logits_processors=logits_processors,
    verbose=True,
)

Downloads last month: 39

Safetensors

Model size

6B params

Tensor type

BF16

U32

F32

MLX

Hardware compatibility

8-bit

Model tree for arcee-ai/Trinity-Nano-Preview-MLX-8bit

Base model

arcee-ai/Trinity-Nano-Base-Pre-Anneal

Finetuned

arcee-ai/Trinity-Nano-Base

Finetuned

arcee-ai/Trinity-Nano-Preview

Quantized

(25)

this model