Instructions to use arcee-ai/Trinity-Nano-Preview-MLX-8bit with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use arcee-ai/Trinity-Nano-Preview-MLX-8bit with MLX:
# Make sure mlx-lm is installed # pip install --upgrade mlx-lm # Generate text with mlx-lm from mlx_lm import load, generate model, tokenizer = load("arcee-ai/Trinity-Nano-Preview-MLX-8bit") prompt = "Write a story about Einstein" messages = [{"role": "user", "content": prompt}] prompt = tokenizer.apply_chat_template( messages, add_generation_prompt=True ) text = generate(model, tokenizer, prompt=prompt, verbose=True) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- LM Studio
- Pi new
How to use arcee-ai/Trinity-Nano-Preview-MLX-8bit with Pi:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "arcee-ai/Trinity-Nano-Preview-MLX-8bit"
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "mlx-lm": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "arcee-ai/Trinity-Nano-Preview-MLX-8bit" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use arcee-ai/Trinity-Nano-Preview-MLX-8bit with Hermes Agent:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "arcee-ai/Trinity-Nano-Preview-MLX-8bit"
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default arcee-ai/Trinity-Nano-Preview-MLX-8bit
Run Hermes
hermes
- MLX LM
How to use arcee-ai/Trinity-Nano-Preview-MLX-8bit with MLX LM:
Generate or start a chat session
# Install MLX LM uv tool install mlx-lm # Interactive chat REPL mlx_lm.chat --model "arcee-ai/Trinity-Nano-Preview-MLX-8bit"
Run an OpenAI-compatible server
# Install MLX LM uv tool install mlx-lm # Start the server mlx_lm.server --model "arcee-ai/Trinity-Nano-Preview-MLX-8bit" # Calling the OpenAI-compatible server with curl curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "arcee-ai/Trinity-Nano-Preview-MLX-8bit", "messages": [ {"role": "user", "content": "Hello"} ] }'
Trinity Nano MLX 8bit
Trinity Nano Preview is a preview of Arcee AI's 6B MoE model with 1B active parameters. It is the small-sized model in our new Trinity family, a series of open-weight models for enterprise and tinkerers alike.
This is a chat tuned model, with a delightful personality and charm we think users will love. We note that this model is pushing the limits of sparsity in small language models with only 800M non-embedding parameters active per token, and as such may be unstable in certain use cases, especially in this preview.
This is an experimental release, it's fun to talk to but will not be hosted anywhere, so download it and try it out yourself!
Trinity Nano Preview is trained on 10T tokens gathered and curated through a key partnership with Datology, building upon the excellent dataset we used on AFM-4.5B with additional math and code.
Training was performed on a cluster of 512 H200 GPUs powered by Prime Intellect using HSDP parallelism.
More details, including key architecture decisions, can be found on our blog here
Model Details
- Model Architecture: AfmoeForCausalLM
- Parameters: 6B, 1B active
- Experts: 128 total, 8 active, 1 shared
- Context length: 128k
- Training Tokens: 10T
- License: Apache 2.0
Use with mlx
pip install mlx-lm
from mlx_lm import load, generate
from mlx_lm.sample_utils import make_sampler, make_logits_processors
model, tokenizer = load("arcee-ai/Trinity-Nano-Preview-MLX-8bit")
prompt = "What is the capital of France?"
if tokenizer.chat_template is not None:
messages = [{"role": "user", "content": prompt}]
prompt = tokenizer.apply_chat_template(
messages, tokenize=False, add_generation_prompt=True
)
sampler = make_sampler(temp=0.1, top_k=50, top_p=0.1)
logits_processors = make_logits_processors(repetition_penalty=1.05)
response = generate(
model,
tokenizer,
prompt=prompt,
max_tokens=512,
sampler=sampler,
logits_processors=logits_processors,
verbose=True,
)
- Downloads last month
- 39
8-bit
Model tree for arcee-ai/Trinity-Nano-Preview-MLX-8bit
Base model
arcee-ai/Trinity-Nano-Base-Pre-Anneal