Update README.md
Browse files
README.md
CHANGED
|
@@ -101,25 +101,6 @@ deepspeed train.py \
|
|
| 101 |
--trust_remote_code
|
| 102 |
```
|
| 103 |
|
| 104 |
-
## 🔧 Regenerating This Model
|
| 105 |
-
|
| 106 |
-
To recreate this converted model:
|
| 107 |
-
|
| 108 |
-
```bash
|
| 109 |
-
# From the TransMLA root directory
|
| 110 |
-
bash scripts/convert/llama3.2-1B.sh
|
| 111 |
-
```
|
| 112 |
-
|
| 113 |
-
Or manually using the converter:
|
| 114 |
-
|
| 115 |
-
```bash
|
| 116 |
-
python transmla/converter.py \
|
| 117 |
-
--model-path meta-llama/Llama-3.2-1B \
|
| 118 |
-
--save-path BarraHome/llama3_2-1B-deepseek \
|
| 119 |
-
--freqfold 4 \
|
| 120 |
-
--ppl-eval-batch-size 16
|
| 121 |
-
```
|
| 122 |
-
|
| 123 |
## 💡 Key Benefits
|
| 124 |
|
| 125 |
- **Memory Efficiency**: ~50% reduction in KV cache memory usage
|
|
@@ -128,14 +109,6 @@ python transmla/converter.py \
|
|
| 128 |
- **Quality Preservation**: Maintains comparable performance to original model
|
| 129 |
- **Hardware Optimization**: Optimized for H100 and similar accelerators
|
| 130 |
|
| 131 |
-
## ⚠️ Requirements
|
| 132 |
-
|
| 133 |
-
- **Python**: 3.8+
|
| 134 |
-
- **PyTorch**: 2.0+
|
| 135 |
-
- **Transformers**: 4.30+
|
| 136 |
-
- **CUDA**: 11.7+ (for GPU acceleration)
|
| 137 |
-
- **Memory**: 8GB+ GPU memory recommended
|
| 138 |
-
|
| 139 |
Optional:
|
| 140 |
- **vLLM**: For optimized inference
|
| 141 |
- **DeepSpeed**: For distributed training
|
|
|
|
| 101 |
--trust_remote_code
|
| 102 |
```
|
| 103 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 104 |
## 💡 Key Benefits
|
| 105 |
|
| 106 |
- **Memory Efficiency**: ~50% reduction in KV cache memory usage
|
|
|
|
| 109 |
- **Quality Preservation**: Maintains comparable performance to original model
|
| 110 |
- **Hardware Optimization**: Optimized for H100 and similar accelerators
|
| 111 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 112 |
Optional:
|
| 113 |
- **vLLM**: For optimized inference
|
| 114 |
- **DeepSpeed**: For distributed training
|