XLS-R Lao ASR
Fine-tuned XLS-R-300M model for Lao automatic speech recognition, achieving 15.14% CER on test data.
Model Details
This model is fine-tuned from facebook/wav2vec2-xls-r-300m using the SiangLao/lao-asr-thesis-dataset.
Training Configuration
- Epochs: 15
- Batch Size: 16
- Learning Rate: 1e-4
- Training Date: June 3, 2025
- Vocabulary Size: 55 Lao characters + special tokens
Performance
| Split | CER |
|---|---|
| Validation | 13.67% |
| Test | 15.14% |
Usage
from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor
import torch
import librosa
# Load model and processor
model = Wav2Vec2ForCTC.from_pretrained("SiangLao/xls-r-lao-asr")
processor = Wav2Vec2Processor.from_pretrained("SiangLao/xls-r-lao-asr")
# Load audio
audio, sr = librosa.load("audio.wav", sr=16000)
# Process audio
inputs = processor(audio, sampling_rate=16000, return_tensors="pt")
# Generate prediction
with torch.no_grad():
logits = model(**inputs).logits
predicted_ids = torch.argmax(logits, dim=-1)
transcription = processor.batch_decode(predicted_ids)[0]
# Clean transcription
transcription = transcription.replace("<unk>", " ").strip()
print(transcription)
Citation
@thesis{naovalath2025lao,
title={Lao Automatic Speech Recognition using Transfer Learning},
author={Souphaxay Naovalath and Sounmy Chanthavong},
advisor={Dr. Somsack Inthasone},
school={National University of Laos, Faculty of Natural Sciences, Computer Science Department},
year={2025}
}
- Downloads last month
- 31
Model tree for SiangLao/xls-r-lao-asr
Base model
facebook/wav2vec2-xls-r-300m