Soren-1-Small / README.md
Andy-ML-And-AI's picture
Add math patch note
3c7ab9c verified
---
language:
- en
license: apache-2.0
base_model:
- Qwen/Qwen3.5-2B
library_name: transformers
pipeline_tag: text-generation
tags:
- soren
- soren-1
- soren-1-small
- syntropy-ai
- project-syntropic
- chat
- assistant
- conversational
- instruct
- reasoning
- thinking
- long-context
- qwen
- qwen3
- qwen3.5
- qwen3.5-2b
- transformer
- decoder-only
- sft
- dpo
- lora
- merged-lora
- unsloth
- trl
- preference-optimization
- supervised-finetuning
- coding
- code-generation
- code-assistant
- software-engineering
- python
- reasoning-model
- math
- problem-solving
- instruction-following
- agentic
- analytical
- 1m-context
- long-context-model
- yarn
- yarn-4x
- 1048576-context
- human-like
- low-hallucination
- honest-ai
- helpful-assistant
- rtx-pro-6000-blackwell
- blackwell
- nvidia
- claude-style
- reasoning-assistant
- coding-model
- compact-model
- edge-ai
- local-llm
- open-weights
- open-source-ai
datasets:
- syntropy-ai/Its-Me-Soren
- Roman1111111/claude-sonnet-4.6-120000x
- angrygiraffe/claude-opus-4.6-4.7-reasoning-8.7k
- TeichAI/Claude-Opus-4.6-Reasoning-887x
- dalisoft/claude-opus-4.6-high-reasoning-700x
- TeichAI/claude-4.5-opus-high-reasoning-250x
- Roman1111111/claude-opus-4.6-10000x
- TeichAI/Claude-Sonnet-4.6-Reasoning-1100x
- Hastagaras/Claude-Sonnet-X-Opus-4.6-Reasoning-small-500
- TeichAI/lordx64-claude-opus-4.7-max-cleaned
- ianncity/Hunter-Alpha-Programming-160000x
- TeichAI/gpt-5-codex-250x
- TeichAI/gpt-5.2-high-reasoning-250x
- Nettoov/Gpt-5.4-Xhigh-Reasoning-2000x
- nvidia/HelpSteer
- m-a-p/CodeFeedback-Filtered-Instruction
- Sashvat/HyperThink-X-Nvidia-Opencode-Reasoning-200K
- openbmb/UltraFeedback
- argilla/OpenHermesPreferences
- argilla/distilabel-capybara-dpo-7k-binarized
- Vezora/Code-Preference-Pairs
- Code-Refinement/dpo-sample-perfect-less
widget:
- text: "Write a Python implementation of A* pathfinding."
- text: "Explain special relativity like I'm 15."
- text: "Debug this Rust code."
- text: "Design a PostgreSQL schema for a social network."
inference: false
---
# Soren-1-Small
<div align="center">
![Syntropy-AI](https://img.shields.io/badge/Syntropy--AI-Project%20Syntropic-F5C842?style=for-the-badge)
![Base Model](https://img.shields.io/badge/Base-Qwen3.5--2B-blue?style=for-the-badge)
![Context](https://img.shields.io/badge/Context-1M%20tokens-green?style=for-the-badge)
![License](https://img.shields.io/badge/License-Apache%202.0-orange?style=for-the-badge)
**Soren is a fine-tuned AI assistant built to think before it speaks, write code that actually works, and talk like a person — not a product.**
</div>
---
## Note:
Model's **"math" patch** is to be made(model got 60% on gsm8k hence we are making a math patch)
## What is Soren?
Soren is an AI assistant created by Andy at Syntropy-AI as part of Project Syntropic. It is built on Qwen3.5-2B and trained through a carefully sequenced pipeline of supervised fine-tuning and direct preference optimization to produce a model with four core strengths:
- **Reasoning first** — Soren thinks through problems before answering. It uses extended chain-of-thought internally and surfaces its reasoning when it helps the user follow along.
- **Low hallucination** — Soren does not invent facts, statistics, citations, or technical details. When it does not know something, it says so.
- **Honest code** — Soren never fabricates APIs, libraries, or function signatures. It writes complete implementations, not stubs. No placeholder comments, no shortcuts.
- **Human-like tone** — Soren does not open with "Certainly!" or "As an AI...". It communicates directly and warmly, the way a knowledgeable friend would.
Soren-1-Small is the first release in the Soren-1 family. Medium (9B) and Large (27B) are planned.
### Context Window
Soren-1-Small supports a **1,048,576 token (~1M) context window** via YaRN 4x extension applied post-training. The base Qwen3.5-2B native context is 262,144 tokens.
---
## Training Configuration
| Parameter | Value |
|---|---|
| Base model | Qwen/Qwen3.5-2B |
| Precision | BF16 full LoRA (no quantization) |
| LoRA rank | 64 |
| LoRA alpha | 128 |
| Target modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
| Sequence length | 4096–8192 (session dependent) |
| Effective batch size | 16 |
| Optimizer | adamw_torch |
| Framework | Unsloth + TRL |
| Hardware | NVIDIA RTX PRO 6000 Blackwell (96GB VRAM) |
---
## Training Pipeline
### SFT — Supervised Fine-Tuning
Training followed a sequential LoRA chaining approach. Each session trains a LoRA adapter, merges it into the base model, and the next session trains on the merged result.
**Session 0 — Soren Identity v1**
- Dataset: `syntropy-ai/Its-Me-Soren` (200 examples)
- LR: `5e-5` | Epochs: 3
- Purpose: Establish Soren's core identity and persona before any other training
**Session 1 — Reasoning Warmup**
- Dataset: `Roman1111111/claude-sonnet-4.6-120000x` (80k slice, capped at 250 steps)
- LR: `2e-4`
- Purpose: Broad instruction following and conversational quality foundation
**Session 2A — Claude Core (~26k examples, capped at 800 steps)**
- LR: `5e-5`
- Datasets:
- `angrygiraffe/claude-opus-4.6-4.7-reasoning-8.7k` (8.7k slice)
- `TeichAI/Claude-Opus-4.6-Reasoning-887x`
- `dalisoft/claude-opus-4.6-high-reasoning-700x`
- `TeichAI/claude-4.5-opus-high-reasoning-250x`
- `Roman1111111/claude-opus-4.6-10000x`
- `TeichAI/Claude-Sonnet-4.6-Reasoning-1100x`
- `Hastagaras/Claude-Sonnet-X-Opus-4.6-Reasoning-small-500`
- `TeichAI/lordx64-claude-opus-4.7-max-cleaned`
**Session 2B — Code Core (~24.5k examples)**
- LR: `3e-5`
- Datasets:
- `ianncity/Hunter-Alpha-Programming-160000x` (2k slice)
- `TeichAI/gpt-5-codex-250x`
- `TeichAI/gpt-5.2-high-reasoning-250x`
- `Nettoov/Gpt-5.4-Xhigh-Reasoning-2000x`
- `nvidia/HelpSteer` (5k slice)
- `m-a-p/CodeFeedback-Filtered-Instruction` (5k slice)
- `Sashvat/HyperThink-X-Nvidia-Opencode-Reasoning-200K` (10k slice)
**Session 4 — Soren Identity v2 (Rescue)**
- Dataset: `syntropy-ai/Its-Me-Soren` (200 examples)
- LR: `5e-5` | Epochs: 3
- Purpose: Reinforce Soren's identity after ~60k examples of third-party data
---
### DPO — Direct Preference Optimization
**DPO1 — General Preference (~34k examples)**
- LR: `1e-6` | Beta: `0.1`
- Datasets:
- `openbmb/UltraFeedback` (17k slice)
- `argilla/OpenHermesPreferences` (10k slice)
- `argilla/distilabel-capybara-dpo-7k-binarized` (full, ~7k)
**DPO2 — Code Preference (~6k examples)**
- LR: `1e-6` | Beta: `0.1`
- Datasets:
- `Vezora/Code-Preference-Pairs` (5k slice)
- `Code-Refinement/dpo-sample-perfect-less` (full)
---
### Post-Training
- **YaRN 4x** applied to `config.json` — extends context from 262k to ~1M tokens
- **Default system prompt** baked into the chat template — Soren's identity, tone, and behavioral guidelines are always active without needing to pass a system message at inference time
---
## Usage
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model = AutoModelForCausalLM.from_pretrained(
"syntropy-ai/Soren-1-Small",
torch_dtype=torch.bfloat16,
device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("syntropy-ai/Soren-1-Small")
messages = [{"role": "user", "content": "Write a Python function to check if a number is prime."}]
prompt = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
enable_thinking=True,
)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
**inputs,
max_new_tokens=2048,
temperature=0.7,
do_sample=True,
)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))
```
---
## Credits
**Created by** [Andy](https://github.com/Andy-ML-And-AI) at [Syntropy-AI](https://huggingface.co/syntropy-ai) — Project Syntropic
**Compute** generously provided by [Lightning.ai](https://lightning.ai) — NVIDIA RTX PRO 6000 Blackwell (96GB VRAM)
**Training framework** — [Unsloth](https://github.com/unslothai/unsloth) by Daniel Han-Chen and the Unsloth team
**Dataset authors** — All dataset creators listed in the pipeline above. Thank you for making your data public.
---
## License
Apache 2.0