Instructions to use Arioron/Vex-Amber-Mini-1.2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Arioron/Vex-Amber-Mini-1.2 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Arioron/Vex-Amber-Mini-1.2")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Arioron/Vex-Amber-Mini-1.2")
model = AutoModelForCausalLM.from_pretrained("Arioron/Vex-Amber-Mini-1.2")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Inference
Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use Arioron/Vex-Amber-Mini-1.2 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Arioron/Vex-Amber-Mini-1.2"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Arioron/Vex-Amber-Mini-1.2",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Arioron/Vex-Amber-Mini-1.2

SGLang

How to use Arioron/Vex-Amber-Mini-1.2 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Arioron/Vex-Amber-Mini-1.2" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Arioron/Vex-Amber-Mini-1.2",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Arioron/Vex-Amber-Mini-1.2" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Arioron/Vex-Amber-Mini-1.2",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use Arioron/Vex-Amber-Mini-1.2 with Docker Model Runner:
```
docker model run hf.co/Arioron/Vex-Amber-Mini-1.2
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

Vex Amber Mini 1.2

Model Description

Vex Amber Mini 1.2 is a 0.6B parameter decoder-only transformer model that demonstrates exceptional capabilities in mathematical reasoning and code generation. Building upon Vex Amber Mini 1.0, this model achieves state-of-the-art performance for its size class, particularly excelling in programming tasks and mathematical problem-solving.

Developed by: Arioron
Model type: Decoder-only Transformer
Language(s): English
License: Apache 2.0
Finetuned from model: Arioron/Vex-Amber-Mini-1.0

Model Sources

Base Model: Qwen/Qwen3-0.6B
Repository: https://huggingface.co/Arioron/Vex-Amber-Mini-1.2
Documentation: Arioron Model Docs

Performance

Benchmark	Metric	Score
HumanEval	Pass@1	21.34%
MBPP	Pass@1	38.7%
GSM8K	Accuracy	65.2%
MATH	Accuracy	45.8%
MMLU	Accuracy	58.3%

Quick Start

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_name = "Arioron/Vex-Amber-Mini-1.2"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.float16,
    device_map="auto"
)

# Code generation example
prompt = "Write a Python function to reverse a linked list:"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

outputs = model.generate(
    **inputs,
    max_new_tokens=256,
    temperature=0.7,
    do_sample=True,
    top_p=0.9,
    pad_token_id=tokenizer.eos_token_id
)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Capabilities

🎯 Code Generation

# Example: The model can generate efficient algorithms
def quick_sort(arr):
    if len(arr) <= 1:
        return arr
    pivot = arr[len(arr) // 2]
    left = [x for x in arr if x < pivot]
    middle = [x for x in arr if x == pivot]
    right = [x for x in arr if x > pivot]
    return quick_sort(left) + middle + quick_sort(right)

🔢 Mathematical Reasoning

# Example: Solve quadratic equations and explain steps
"""
Solve: x² - 5x + 6 = 0
Step 1: Factor the equation: (x - 2)(x - 3) = 0
Step 2: Set each factor to zero: x - 2 = 0 or x - 3 = 0
Step 3: Solve for x: x = 2 or x = 3
"""

Training Details

Training Data

The model was trained on a carefully curated mixture of:

45% Code (Python, JavaScript, Java, C++)
30% Mathematical content (textbooks, problems, proofs)
15% General reasoning tasks
10% Conversational data

Technical Specifications

Architecture: Transformer-based decoder
Context Length: 8,192 tokens
Precision: float16
Training Framework: Native PyTorch
Positional Encoding: Rotary Positional Embeddings (RoPE)

Intended Uses

Direct Use

Code completion and generation
Mathematical problem solving
Educational assistance
Technical documentation
Research prototyping

Downstream Use

Integration into IDEs and code editors
Educational platforms
Technical chatbots
Research tools for mathematics and computer science

Limitations

The 0.6B parameter count may limit performance on extremely complex, multi-step reasoning tasks
While strong for its size, it may not match the performance of larger models (7B+) on some benchmarks
Context window of 8K tokens may be insufficient for very long code files or documents

Ethical Considerations

The model is trained on publicly available data and is designed to be helpful, harmless, and honest. However, as with any language model:

Outputs should be verified for accuracy in critical applications
The model should not be used for high-stakes decisions without human oversight
Users should be aware of potential biases in training data

Citation

If you use this model in your research, please cite:

@misc{vexambermini1.2,
  title = {Vex Amber Mini 1.2: A Compact Language Model for Code and Mathematics},
  author = {Arioron},
  year = {2025},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/Arioron/Vex-Amber-Mini-1.2}}
}

Contact

Acknowledgements

Thanks to the open-source community and the Qwen team for their foundational work. Special thanks to all contributors and researchers who have advanced the field of efficient language modeling.

For technical details, training recipes, and comprehensive evaluation results, please refer to our technical documentation.

Downloads last month: 11

Safetensors

Model size

0.6B params

Tensor type

F16

Model tree for Arioron/Vex-Amber-Mini-1.2

Base model

Qwen/Qwen3-0.6B-Base

Finetuned

Qwen/Qwen3-0.6B

Finetuned

Arioron/Vex-Amber-Mini-1.0

Finetuned

(1)

this model

Finetunes

1 model

Quantizations

1 model

Arioron
/

Vex-Amber-Mini-1.2

Vex Amber Mini 1.2

Model Description

Model Sources

Performance

Quick Start

Capabilities

🎯 Code Generation

🔢 Mathematical Reasoning

Training Details

Training Data

Technical Specifications

Intended Uses

Direct Use

Downstream Use

Limitations

Ethical Considerations

Citation

Contact

Acknowledgements

Model tree for Arioron/Vex-Amber-Mini-1.2

Space using Arioron/Vex-Amber-Mini-1.2 1