BarraHome
/

llama3_2-1B-deepseek

Text Generation

Model card Files Files and versions

BarraHome commited on Jun 11, 2025

Commit

95df930

·

verified ·

1 Parent(s): fd3c4e4

Update README.md

Files changed (1) hide show

README.md +0 -27

README.md CHANGED Viewed

@@ -101,25 +101,6 @@ deepspeed train.py \
     --trust_remote_code
 ```
-## 🔧 Regenerating This Model
-To recreate this converted model:
-```bash
-# From the TransMLA root directory
-bash scripts/convert/llama3.2-1B.sh
-```
-Or manually using the converter:
-```bash
-python transmla/converter.py \
-    --model-path meta-llama/Llama-3.2-1B \
-    --save-path BarraHome/llama3_2-1B-deepseek \
-    --freqfold 4 \
-    --ppl-eval-batch-size 16
-```
 ## 💡 Key Benefits
 - **Memory Efficiency**: ~50% reduction in KV cache memory usage
@@ -128,14 +109,6 @@ python transmla/converter.py \
 - **Quality Preservation**: Maintains comparable performance to original model
 - **Hardware Optimization**: Optimized for H100 and similar accelerators
-## ⚠️ Requirements
-- **Python**: 3.8+
-- **PyTorch**: 2.0+
-- **Transformers**: 4.30+
-- **CUDA**: 11.7+ (for GPU acceleration)
-- **Memory**: 8GB+ GPU memory recommended
 Optional:
 - **vLLM**: For optimized inference
 - **DeepSpeed**: For distributed training

     --trust_remote_code
 ```
 ## 💡 Key Benefits
 - **Memory Efficiency**: ~50% reduction in KV cache memory usage
 - **Quality Preservation**: Maintains comparable performance to original model
 - **Hardware Optimization**: Optimized for H100 and similar accelerators
 Optional:
 - **vLLM**: For optimized inference
 - **DeepSpeed**: For distributed training