---
language:
  - en

license: apache-2.0

base_model:
  - Qwen/Qwen3.5-2B

library_name: transformers

pipeline_tag: text-generation

tags:
  - soren
  - soren-1
  - soren-1-small
  - syntropy-ai
  - project-syntropic


  - chat
  - assistant
  - conversational
  - instruct
  - reasoning
  - thinking
  - long-context


  - qwen
  - qwen3
  - qwen3.5
  - qwen3.5-2b
  - transformer
  - decoder-only


  - sft
  - dpo
  - lora
  - merged-lora
  - unsloth
  - trl
  - preference-optimization
  - supervised-finetuning


  - coding
  - code-generation
  - code-assistant
  - software-engineering
  - python
  - reasoning-model
  - math
  - problem-solving
  - instruction-following
  - agentic
  - analytical

  - 1m-context
  - long-context-model
  - yarn
  - yarn-4x
  - 1048576-context

  - human-like
  - low-hallucination
  - honest-ai
  - helpful-assistant

  - rtx-pro-6000-blackwell
  - blackwell
  - nvidia

  - claude-style
  - reasoning-assistant
  - coding-model
  - compact-model
  - edge-ai
  - local-llm
  - open-weights
  - open-source-ai

datasets:
  - syntropy-ai/Its-Me-Soren
  - Roman1111111/claude-sonnet-4.6-120000x
  - angrygiraffe/claude-opus-4.6-4.7-reasoning-8.7k
  - TeichAI/Claude-Opus-4.6-Reasoning-887x
  - dalisoft/claude-opus-4.6-high-reasoning-700x
  - TeichAI/claude-4.5-opus-high-reasoning-250x
  - Roman1111111/claude-opus-4.6-10000x
  - TeichAI/Claude-Sonnet-4.6-Reasoning-1100x
  - Hastagaras/Claude-Sonnet-X-Opus-4.6-Reasoning-small-500
  - TeichAI/lordx64-claude-opus-4.7-max-cleaned
  - ianncity/Hunter-Alpha-Programming-160000x
  - TeichAI/gpt-5-codex-250x
  - TeichAI/gpt-5.2-high-reasoning-250x
  - Nettoov/Gpt-5.4-Xhigh-Reasoning-2000x
  - nvidia/HelpSteer
  - m-a-p/CodeFeedback-Filtered-Instruction
  - Sashvat/HyperThink-X-Nvidia-Opencode-Reasoning-200K
  - openbmb/UltraFeedback
  - argilla/OpenHermesPreferences
  - argilla/distilabel-capybara-dpo-7k-binarized
  - Vezora/Code-Preference-Pairs
  - Code-Refinement/dpo-sample-perfect-less

widget:
  - text: "Write a Python implementation of A* pathfinding."
  - text: "Explain special relativity like I'm 15."
  - text: "Debug this Rust code."
  - text: "Design a PostgreSQL schema for a social network."

inference: false

---

# Soren-1-Small

<div align="center">

![Syntropy-AI](https://img.shields.io/badge/Syntropy--AI-Project%20Syntropic-F5C842?style=for-the-badge)
![Base Model](https://img.shields.io/badge/Base-Qwen3.5--2B-blue?style=for-the-badge)
![Context](https://img.shields.io/badge/Context-1M%20tokens-green?style=for-the-badge)
![License](https://img.shields.io/badge/License-Apache%202.0-orange?style=for-the-badge)

**Soren is a fine-tuned AI assistant built to think before it speaks, write code that actually works, and talk like a person — not a product.**

</div>

---

## Note:
Model's **"math" patch** is to be made(model got 60% on gsm8k hence we are making a math patch)


## What is Soren?

Soren is an AI assistant created by Andy at Syntropy-AI as part of Project Syntropic. It is built on Qwen3.5-2B and trained through a carefully sequenced pipeline of supervised fine-tuning and direct preference optimization to produce a model with four core strengths:

- **Reasoning first** — Soren thinks through problems before answering. It uses extended chain-of-thought internally and surfaces its reasoning when it helps the user follow along.
- **Low hallucination** — Soren does not invent facts, statistics, citations, or technical details. When it does not know something, it says so.
- **Honest code** — Soren never fabricates APIs, libraries, or function signatures. It writes complete implementations, not stubs. No placeholder comments, no shortcuts.
- **Human-like tone** — Soren does not open with "Certainly!" or "As an AI...". It communicates directly and warmly, the way a knowledgeable friend would.

Soren-1-Small is the first release in the Soren-1 family. Medium (9B) and Large (27B) are planned.

### Context Window

Soren-1-Small supports a **1,048,576 token (~1M) context window** via YaRN 4x extension applied post-training. The base Qwen3.5-2B native context is 262,144 tokens.

---

## Training Configuration

| Parameter | Value |
|---|---|
| Base model | Qwen/Qwen3.5-2B |
| Precision | BF16 full LoRA (no quantization) |
| LoRA rank | 64 |
| LoRA alpha | 128 |
| Target modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
| Sequence length | 4096–8192 (session dependent) |
| Effective batch size | 16 |
| Optimizer | adamw_torch |
| Framework | Unsloth + TRL |
| Hardware | NVIDIA RTX PRO 6000 Blackwell (96GB VRAM) |

---

## Training Pipeline

### SFT — Supervised Fine-Tuning

Training followed a sequential LoRA chaining approach. Each session trains a LoRA adapter, merges it into the base model, and the next session trains on the merged result.

**Session 0 — Soren Identity v1**
- Dataset: `syntropy-ai/Its-Me-Soren` (200 examples)
- LR: `5e-5` | Epochs: 3
- Purpose: Establish Soren's core identity and persona before any other training

**Session 1 — Reasoning Warmup**
- Dataset: `Roman1111111/claude-sonnet-4.6-120000x` (80k slice, capped at 250 steps)
- LR: `2e-4`
- Purpose: Broad instruction following and conversational quality foundation

**Session 2A — Claude Core (~26k examples, capped at 800 steps)**
- LR: `5e-5`
- Datasets:
  - `angrygiraffe/claude-opus-4.6-4.7-reasoning-8.7k` (8.7k slice)
  - `TeichAI/Claude-Opus-4.6-Reasoning-887x`
  - `dalisoft/claude-opus-4.6-high-reasoning-700x`
  - `TeichAI/claude-4.5-opus-high-reasoning-250x`
  - `Roman1111111/claude-opus-4.6-10000x`
  - `TeichAI/Claude-Sonnet-4.6-Reasoning-1100x`
  - `Hastagaras/Claude-Sonnet-X-Opus-4.6-Reasoning-small-500`
  - `TeichAI/lordx64-claude-opus-4.7-max-cleaned`

**Session 2B — Code Core (~24.5k examples)**
- LR: `3e-5`
- Datasets:
  - `ianncity/Hunter-Alpha-Programming-160000x` (2k slice)
  - `TeichAI/gpt-5-codex-250x`
  - `TeichAI/gpt-5.2-high-reasoning-250x`
  - `Nettoov/Gpt-5.4-Xhigh-Reasoning-2000x`
  - `nvidia/HelpSteer` (5k slice)
  - `m-a-p/CodeFeedback-Filtered-Instruction` (5k slice)
  - `Sashvat/HyperThink-X-Nvidia-Opencode-Reasoning-200K` (10k slice)

**Session 4 — Soren Identity v2 (Rescue)**
- Dataset: `syntropy-ai/Its-Me-Soren` (200 examples)
- LR: `5e-5` | Epochs: 3
- Purpose: Reinforce Soren's identity after ~60k examples of third-party data

---

### DPO — Direct Preference Optimization

**DPO1 — General Preference (~34k examples)**
- LR: `1e-6` | Beta: `0.1`
- Datasets:
  - `openbmb/UltraFeedback` (17k slice)
  - `argilla/OpenHermesPreferences` (10k slice)
  - `argilla/distilabel-capybara-dpo-7k-binarized` (full, ~7k)

**DPO2 — Code Preference (~6k examples)**
- LR: `1e-6` | Beta: `0.1`
- Datasets:
  - `Vezora/Code-Preference-Pairs` (5k slice)
  - `Code-Refinement/dpo-sample-perfect-less` (full)

---

### Post-Training

- **YaRN 4x** applied to `config.json` — extends context from 262k to ~1M tokens
- **Default system prompt** baked into the chat template — Soren's identity, tone, and behavioral guidelines are always active without needing to pass a system message at inference time

---

## Usage

```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model = AutoModelForCausalLM.from_pretrained(
    "syntropy-ai/Soren-1-Small",
    torch_dtype=torch.bfloat16,
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("syntropy-ai/Soren-1-Small")

messages = [{"role": "user", "content": "Write a Python function to check if a number is prime."}]

prompt = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
    enable_thinking=True,
)

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
    **inputs,
    max_new_tokens=2048,
    temperature=0.7,
    do_sample=True,
)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))
```

---

## Credits

**Created by** [Andy](https://github.com/Andy-ML-And-AI) at [Syntropy-AI](https://huggingface.co/syntropy-ai) — Project Syntropic

**Compute** generously provided by [Lightning.ai](https://lightning.ai) — NVIDIA RTX PRO 6000 Blackwell (96GB VRAM)

**Training framework** — [Unsloth](https://github.com/unslothai/unsloth) by Daniel Han-Chen and the Unsloth team

**Dataset authors** — All dataset creators listed in the pipeline above. Thank you for making your data public.

---

## License

Apache 2.0