--- library_name: peft base_model: google/gemma-3-4b-it language: - nso - en tags: - translation - african-languages - scientific-translation - afriscience-mt - lora - peft - gemma license: apache-2.0 pipeline_tag: translation model-index: - name: gemma_3_4b_it-lora-r16-nso-eng results: - task: type: translation metrics: - name: BLEU (test) type: bleu value: 37.07 - name: chrF (test) type: chrf value: 57.33 - name: SSA-COMET (test) type: comet value: 65.06 --- # gemma_3_4b_it-lora-r16-nso-eng [![Model on HF](https://huggingface.co/datasets/huggingface/badges/raw/main/model-on-hf-sm.svg)](https://huggingface.co/AfriScience-MT/gemma_3_4b_it-lora-r16-nso-eng) This is a **LoRA adapter** for the AfriScience-MT project, enabling efficient scientific machine translation for African languages. ## Adapter Description | Property | Value | |----------|-------| | **Base Model** | [google/gemma-3-4b-it](https://huggingface.co/google/gemma-3-4b-it) | | **Translation Direction** | Northern Sotho → English | | **LoRA Rank (r)** | 16 | | **LoRA Alpha** | 32 | | **Training Method** | QLoRA (4-bit quantization) | | **Domain** | Scientific/Academic texts | ### Why LoRA? LoRA (Low-Rank Adaptation) enables efficient fine-tuning by training only a small number of additional parameters. This adapter adds only **~8.0M parameters** to the base model while achieving strong translation performance. ## Evaluation Results Performance on the AfriScience-MT test set: | Split | BLEU | chrF | SSA-COMET | |-------|------|------|-----------| | Validation | 41.69 | 61.46 | 66.46 | | **Test** | **37.07** | **57.33** | **65.06** | **Metrics explanation:** - **BLEU**: Measures n-gram overlap with reference translations (0-100, higher is better) - **chrF**: Character-level F-score, robust for morphologically rich languages (0-100, higher is better) - **SSA-COMET**: Neural metric trained for Sub-Saharan African languages, shown as percentage (0-100, higher is better) ([McGill-NLP/ssa-comet-stl](https://huggingface.co/McGill-NLP/ssa-comet-stl)) ## Usage ### Quick Start ```python from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig from peft import PeftModel import torch # Configure 4-bit quantization (recommended for memory efficiency) bnb_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_compute_dtype=torch.bfloat16, bnb_4bit_quant_type="nf4", bnb_4bit_use_double_quant=True, ) # Load base model base_model = AutoModelForCausalLM.from_pretrained( "google/gemma-3-4b-it", quantization_config=bnb_config, device_map="auto", torch_dtype=torch.bfloat16, ) tokenizer = AutoTokenizer.from_pretrained("google/gemma-3-4b-it") # Load LoRA adapter adapter_name = "AfriScience-MT/gemma_3_4b_it-lora-r16-nso-eng" model = PeftModel.from_pretrained(base_model, adapter_name) model.eval() # Prepare translation prompt source_text = "Climate change significantly impacts agricultural productivity in sub-Saharan Africa." instruction = "Translate the following Northern Sotho scientific text to English." # Format for Gemma chat template messages = [{"role": "user", "content": f"{instruction}\n\n{source_text}"}] prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) # Generate translation inputs = tokenizer(prompt, return_tensors="pt").to(model.device) with torch.no_grad(): outputs = model.generate( **inputs, max_new_tokens=256, num_beams=5, early_stopping=True, pad_token_id=tokenizer.pad_token_id, ) # Decode only the generated part generated = outputs[0][inputs["input_ids"].shape[1]:] translation = tokenizer.decode(generated, skip_special_tokens=True) print(translation) ``` ### Without Quantization (Full Precision) ```python # For GPUs with sufficient memory (>24GB for larger models) base_model = AutoModelForCausalLM.from_pretrained( "google/gemma-3-4b-it", device_map="auto", torch_dtype=torch.bfloat16, ) model = PeftModel.from_pretrained(base_model, "AfriScience-MT/gemma_3_4b_it-lora-r16-nso-eng") ``` ## Training Details ### Hyperparameters | Parameter | Value | |-----------|-------| | LoRA Rank (r) | 16 | | LoRA Alpha | 32 | | LoRA Dropout | 0.05 | | Target Modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj | | Epochs | 3 | | Batch Size | 2 | | Learning Rate | 2e-04 | | Max Sequence Length | 512 | | Gradient Accumulation | 4 | ### Hardware Requirements | Configuration | VRAM Required | |---------------|---------------| | 4-bit (QLoRA) | ~8-12 GB | | 8-bit | ~16-20 GB | | Full precision | ~24-40 GB | ## Reproducibility To reproduce this adapter: ```bash # Clone the AfriScience-MT repository git clone https://github.com/afriscience-mt/afriscience-mt.git cd afriscience-mt # Install dependencies pip install -r requirements.txt # Run LoRA training python -m afriscience_mt.scripts.run_lora_training \ --data_dir ./data \ --source_lang nso \ --target_lang eng \ --model_name google/gemma-3-4b-it \ --model_type gemma \ --lora_rank 16 \ --output_dir ./output \ --num_epochs 3 \ --batch_size 4 \ --load_in_4bit ``` ## Limitations - **Domain Specificity**: Optimized for scientific/academic texts; may underperform on casual or colloquial language. - **Language Direction**: Only supports Northern Sotho → English translation. - **Base Model Required**: Must be used with the [google/gemma-3-4b-it](https://huggingface.co/google/gemma-3-4b-it) base model. - **Context Length**: Maximum context is model-dependent; longer texts should be chunked. ## Citation If you use this adapter, please cite the AfriScience-MT project: ```bibtex @inproceedings{afriscience-mt-2025, title={AfriScience-MT: Machine Translation for African Scientific Literature}, author={AfriScience-MT Team}, year={2025}, url={https://github.com/afriscience-mt/afriscience-mt} } ``` ## License This adapter is released under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0). ## Acknowledgments - Base model: [google/gemma-3-4b-it](https://huggingface.co/google/gemma-3-4b-it) - LoRA implementation: [PEFT](https://github.com/huggingface/peft) - Evaluation: [SSA-COMET](https://huggingface.co/McGill-NLP/ssa-comet-stl) for African language assessment