English-to-Telugu Translator (NLLB Fine-Tuned)
This is a sequence-to-sequence translation model fine-tuned from facebook/nllb-200-distilled-600M to translate from English to Telugu.
This model was developed as part of the SOAI 2025 Internship Program. The primary goal was to create a useful translation tool that also serves as a "Corpus Collection Engine" to gather human-verified translations, thereby helping to improve future AI models for Indian languages.
Model Details
- Base Model:
facebook/nllb-200-distilled-600M - Source Language: English (
eng_Latn) - Target Language: Telugu (
tel_Telu) - Fine-tuning Data: A custom dataset of English-Telugu parallel sentences.
How to Use
You can use this model directly with the transformers library pipeline for easy translation.
from transformers import pipeline
# Use the model ID for your fine-tuned model
model_id = "VijayChandra/english-to-telugu-translator-nllb"
# Create a translation pipeline
translator = pipeline(
"translation",
model=model_id,
src_lang="eng_Latn",
tgt_lang="tel_Telu"
)
# Translate a sentence
english_text = "Hello, how are you?"
telugu_translation = translator(english_text)
print(f"English: {english_text}")
print(f"Telugu: {telugu_translation[0]['translation_text']}")
# Expected Output: Telugu: హలో, మీరు ఎలా ఉన్నారు?
Training Procedure
The model was fine-tuned using the Seq2SeqTrainer from the Hugging Face transformers library. The training process involved preparing a parallel corpus of English and Telugu sentences and training for 3 epochs with a learning rate of 3e-5.
Framework versions
- Transformers 4.40.0
- Pytorch 2.6.0
- Datasets 2.19.0
- Tokenizers 0.19.1