SeaWolf-AI commited on
Commit
7a96f3e
ยท
verified ยท
1 Parent(s): 06f90e5

Add README: Darwin V8 lastbrain (Qwen3.5-2B father + Opus-Distill LoRA mother merged)

Browse files
Files changed (1) hide show
  1. README.md +199 -0
README.md ADDED
@@ -0,0 +1,199 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model: Qwen/Qwen3.5-2B
4
+ tags:
5
+ - qwen
6
+ - qwen3.5
7
+ - reasoning
8
+ - distillation
9
+ - claude-opus
10
+ - darwin-v8
11
+ - sft
12
+ - lora
13
+ - merged
14
+ language:
15
+ - en
16
+ - ko
17
+ - zh
18
+ - ja
19
+ pipeline_tag: text-generation
20
+ library_name: transformers
21
+ ---
22
+
23
+ # ๐Ÿง  lastbrain โ€” Darwin V8
24
+
25
+ **Darwin V8 ๊ธฐ๋ฐ˜ Claude Opus ์ฆ๋ฅ˜ ๋ชจ๋ธ (2B ํŒŒ๋ผ๋ฏธํ„ฐ)**
26
+
27
+ - ๐Ÿ‘จ **Father (Base)**: [`Qwen/Qwen3.5-2B`](https://huggingface.co/Qwen/Qwen3.5-2B)
28
+ - ๐Ÿ‘ฉ **Mother (LoRA Adapter)**: [`FINAL-Bench/Qwen3.5-2B-Opus-Distill-v1`](https://huggingface.co/FINAL-Bench/Qwen3.5-2B-Opus-Distill-v1)
29
+ - ๐Ÿ‘ถ **Child (This model)**: `FINAL-Bench/lastbrain` โ€” merged full-weight standalone
30
+
31
+ ---
32
+
33
+ ## ๐Ÿ“ฆ ํŠน์ง•
34
+
35
+ - **Base**: Qwen3.5-2B (2.3B ํŒŒ๋ผ๋ฏธํ„ฐ, ํ•˜์ด๋ธŒ๋ฆฌ๋“œ ์–ดํ…์…˜)
36
+ - **Training**: SFT + LoRA (`all-linear`, rank=16, ฮฑ=32)
37
+ - **Teachers**: Claude Opus 4.5 / 4.6, Claude Sonnet 4.6 (pre-generated reasoning traces)
38
+ - **Data**: 4,451 ๊ณ ํ’ˆ์งˆ ์ถ”๋ก  ๊ถค์  (4๊ฐœ ๊ณต๊ฐœ ๋ฐ์ดํ„ฐ์…‹)
39
+ - **Merged**: LoRA ์–ด๋Œ‘ํ„ฐ๊ฐ€ base ๊ฐ€์ค‘์น˜์— ์™„์ „ ํ†ตํ•ฉ๋˜์–ด **๋…๋ฆฝ ์‹คํ–‰ ๊ฐ€๋Šฅ**
40
+
41
+ ---
42
+
43
+ ## ๐Ÿš€ ๋น ๋ฅธ ์‚ฌ์šฉ๋ฒ•
44
+
45
+ ```python
46
+ from transformers import AutoTokenizer, AutoModelForCausalLM
47
+ import torch
48
+
49
+ model_id = "FINAL-Bench/lastbrain"
50
+ tok = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
51
+ model = AutoModelForCausalLM.from_pretrained(
52
+ model_id, torch_dtype=torch.bfloat16, device_map="auto", trust_remote_code=True
53
+ )
54
+
55
+ messages = [
56
+ {"role": "user", "content": "If a train travels 60 km in 45 minutes, what is its speed in km/h?"}
57
+ ]
58
+ prompt = tok.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
59
+ inputs = tok(prompt, return_tensors="pt").to(model.device)
60
+
61
+ with torch.no_grad():
62
+ outputs = model.generate(
63
+ **inputs,
64
+ max_new_tokens=800,
65
+ do_sample=False,
66
+ pad_token_id=tok.eos_token_id,
67
+ )
68
+ print(tok.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True))
69
+ ```
70
+
71
+ **์˜ˆ์‹œ ์ถœ๋ ฅ**:
72
+ ```
73
+ To find the speed of the train in km/h, we need to convert the given time from minutes to hours.
74
+
75
+ **Given:**
76
+ - Distance = 60 km
77
+ - Time = 45 minutes
78
+
79
+ **Step 1: Convert time to hours**
80
+ Since there are 60 minutes in 1 hour:
81
+ $$\text{Time in hours} = \frac{45}{60} = 0.75 \text{ hours}$$
82
+
83
+ **Step 2: Calculate speed**
84
+ $$\text{Speed} = \frac{60}{0.75} = 80 \text{ km/h}$$
85
+
86
+ **Final Answer:** The speed of the train is **80 km/h**.
87
+ ```
88
+
89
+ ---
90
+
91
+ ## ๐Ÿงฌ Darwin V8 ํ•™์Šต ํŒŒ์ดํ”„๋ผ์ธ
92
+
93
+ ```
94
+ [Qwen/Qwen3.5-2B] โ”€โ”€โ”€โ”€ Base ๋ชจ๋ธ (๋™๊ฒฐ)
95
+ +
96
+ [4,451 Claude Opus/Sonnet reasoning traces]
97
+ โ†“
98
+ [SFT Training]
99
+ - LoRA (all-linear, r=16, ฮฑ=32)
100
+ - Learning rate: 2e-4 (V8 rule: ร—10 FullFT)
101
+ - 2 epochs, bf16, 8ร—B200 DDP
102
+ - Loss: 1.33 โ†’ 1.10 (-17%)
103
+ - Token accuracy: 68% โ†’ 72% (+4%p)
104
+ โ†“
105
+ [LoRA merge into base weights]
106
+ โ†“
107
+ [lastbrain] โ† ์ด ๋ชจ๋ธ
108
+ ```
109
+
110
+ ---
111
+
112
+ ## ๐Ÿ“Š ํ•™์Šต ๋ฐ์ดํ„ฐ ๊ตฌ์„ฑ
113
+
114
+ | ๋ฐ์ดํ„ฐ์…‹ | ์ƒ˜ํ”Œ ์ˆ˜ | ์ถœ์ฒ˜ Teacher |
115
+ |---------|--------|------|
116
+ | [nohurry/Opus-4.6-Reasoning-3000x-filtered](https://huggingface.co/datasets/nohurry/Opus-4.6-Reasoning-3000x-filtered) | 2,326 | Claude Opus 4.6 |
117
+ | [TeichAI/Claude-Opus-4.6-Reasoning-887x](https://huggingface.co/datasets/TeichAI/Claude-Opus-4.6-Reasoning-887x) | 887 | Claude Opus 4.6 |
118
+ | [TeichAI/claude-4.5-opus-high-reasoning-250x](https://huggingface.co/datasets/TeichAI/claude-4.5-opus-high-reasoning-250x) | 250 | Claude Opus 4.5 |
119
+ | [TeichAI/Claude-Sonnet-4.6-Reasoning-1100x](https://huggingface.co/datasets/TeichAI/Claude-Sonnet-4.6-Reasoning-1100x) | 1,100 | Claude Sonnet 4.6 |
120
+ | **ํ•ฉ๊ณ„ (ํ•„ํ„ฐ ํ›„)** | **4,451** | - |
121
+
122
+ ---
123
+
124
+ ## ๐ŸŽฏ ์„ค๊ณ„ ์ฒ ํ•™ (Darwin V8)
125
+
126
+ 1. **LoRA Without Regret** โ€” `all-linear` target, high LR, ์ž‘์€ rank๋„ OK
127
+ 2. **Response Distillation** โ€” pre-generated Opus traces๋กœ ๋น„์šฉ ํšจ์œจ์  ์ฆ๋ฅ˜
128
+ 3. **Merge-and-Deploy** โ€” LoRA ์–ด๋Œ‘ํ„ฐ ํ†ตํ•ฉ ํ›„ ์ถ”๊ฐ€ ์˜์กด์„ฑ ์—†์ด ๋ฐฐํฌ
129
+
130
+ ---
131
+
132
+ ## ๐Ÿ” ์žฌํ˜„ ๋ฐฉ๋ฒ•
133
+
134
+ ์ด ๋ชจ๋ธ์€ ๋‹ค์Œ ๋‘ ์ปดํฌ๋„ŒํŠธ๋ฅผ mergeํ•˜์—ฌ ๋งŒ๋“ค์–ด์กŒ์Šต๋‹ˆ๋‹ค:
135
+
136
+ ```python
137
+ from transformers import AutoModelForCausalLM
138
+ from peft import PeftModel
139
+ import torch
140
+
141
+ base = AutoModelForCausalLM.from_pretrained(
142
+ "Qwen/Qwen3.5-2B", torch_dtype=torch.bfloat16
143
+ )
144
+ model = PeftModel.from_pretrained(
145
+ base, "FINAL-Bench/Qwen3.5-2B-Opus-Distill-v1"
146
+ )
147
+ merged = model.merge_and_unload()
148
+ merged.save_pretrained("./lastbrain")
149
+ ```
150
+
151
+ ---
152
+
153
+ ## ๐Ÿ“ ์ƒ˜ํ”Œ ํ…Œ์ŠคํŠธ ๊ฒฐ๊ณผ (4๋ฌธ์ œ)
154
+
155
+ | ์œ ํ˜• | ์ •๋‹ต ์—ฌ๋ถ€ | ์‘๋‹ต ๊ธธ์ด |
156
+ |-----|---------|---------|
157
+ | Math (๊ธฐ์ฐจ ์†๋„) | โœ… 80 km/h | 771์ž |
158
+ | Logic (ํ‚ค ๋น„๊ต) | โœ… Carol | 354์ž |
159
+ | Code (์†Œ์ˆ˜ ํŒ๋ณ„) | โœ… Python ํ•จ์ˆ˜ | 1,712์ž |
160
+ | Korean (์ตœ์ €์‹œ๊ธ‰) | โœ… 1,577,600์› | 142์ž |
161
+
162
+ **Markdown/LaTeX/Step-by-Step ๊ตฌ์กฐํ™”๋œ ๋‹ต๋ณ€ ์ž์—ฐ์Šค๋Ÿฝ๊ฒŒ ์ƒ์„ฑ**
163
+
164
+ ---
165
+
166
+ ## โš ๏ธ ์ œํ•œ ์‚ฌํ•ญ
167
+
168
+ - **๊ทœ๋ชจ**: 2.3B ํŒŒ๋ผ๋ฏธํ„ฐ (์†Œํ˜• ๋ชจ๋ธ)
169
+ - **ํ•œ๊ตญ์–ด ๊ณ„์‚ฐ ์ •ํ™•์„ฑ**: ๋•Œ๋กœ ์ˆซ์ž ์˜ค๋ฅ˜ ๋ฐœ์ƒ ๊ฐ€๋Šฅ (์†Œํ˜• ๋ชจ๋ธ ํ•œ๊ณ„)
170
+ - **๊ธด ์ปจํ…์Šค๏ฟฝ๏ฟฝ๏ฟฝ**: ํ•™์Šต ์‹œ max_length=4,096์œผ๋กœ ํ•™์Šต๋จ
171
+ - **`<think>` ํƒœ๊ทธ**: ๋ช…์‹œ์  ์‚ฌ์šฉ ๋‚ฎ์Œ (reasoning์„ ๋ณธ๋ฌธ์— ํ†ตํ•ฉ)
172
+
173
+ ---
174
+
175
+ ## ๐Ÿชช ๋ผ์ด์„ ์Šค
176
+
177
+ - Base model: Apache 2.0 (Qwen)
178
+ - ํ•™์Šต ๋ฐ์ดํ„ฐ: ๊ฐ ๋ฐ์ดํ„ฐ์…‹ ๊ฐœ๋ณ„ ๋ผ์ด์„ ์Šค ์ฐธ์กฐ
179
+ - ์ด ๋ชจ๋ธ: Apache 2.0
180
+
181
+ ---
182
+
183
+ ## ๐Ÿ™ ํฌ๋ ˆ๋”ง
184
+
185
+ - **Base**: Qwen team (Alibaba)
186
+ - **Teacher**: Anthropic (Claude Opus 4.5/4.6, Sonnet 4.6)
187
+ - **๋ฐ์ดํ„ฐ ๊ณต๊ฐœ**: nohurry, TeichAI
188
+ - **Training & Release**: FINAL-Bench / VIDRAFT_LAB
189
+
190
+ ---
191
+
192
+ ## ๐Ÿ”— ๊ด€๋ จ ๋ชจ๋ธ
193
+
194
+ - ๐Ÿง  [`FINAL-Bench/Qwen3.5-2B-Opus-Distill-v1`](https://huggingface.co/FINAL-Bench/Qwen3.5-2B-Opus-Distill-v1) โ€” ์ด ๋ชจ๋ธ์˜ **LoRA ์–ด๋Œ‘ํ„ฐ ๋‹จ๋… ๋ฒ„์ „**
195
+ - โšก [`FINAL-Bench/Qwen3.5-2B-Opus-SDPO-v1`](https://huggingface.co/FINAL-Bench/Qwen3.5-2B-Opus-SDPO-v1) โ€” Phase 4 SDPO ์ž๊ธฐ์ฆ๋ฅ˜ ๊ฐ•ํ™”๋ณธ
196
+
197
+ ---
198
+
199
+ *Darwin V8 ยท Part of the evolutionary model merging series by VIDRAFT_LAB*