XLM-RoBERTa for Algerian Darija Misinformation Detection
Model Description
Fine-tuned XLM-RoBERTa-base model for detecting misinformation in Algerian Darija text. This model excels at handling code-switching between Arabic, French, and Latin scripts, making it particularly suitable for multilingual Algerian social media content.
Base Model: xlm-roberta-base
Model ID: 3.2
Task: Multi-class text classification (5 classes)
Classes
- F (Fake): False or fabricated information
- R (Real): Factual information or news reporting
- N (Non-news): Non-informational content
- M (Misleading): Partially true but misleading content
- S (Satire): Satirical or humorous content
Performance
| Metric | Score |
|---|---|
| Accuracy | 76.17% |
| Macro F1 | 67.33% |
| Macro Precision | 67.45% |
| Macro Recall | 67.94% |
Per-Class Performance
| Class | Precision | Recall | F1-Score | Support |
|---|---|---|---|---|
| F (Fake) | 90.46% | 78.68% | 84.16% | 952 |
| R (Real) | 76.97% | 78.42% | 77.69% | 848 |
| N (Non-news) | 84.92% | 76.83% | 80.67% | 872 |
| M (Misleading) | 56.52% | 73.74% | 63.99% | 594 |
| S (Satire) | 28.41% | 32.05% | 30.12% | 78 |
Usage
# test_model_3_2.py
import os
# CRITICAL: Disable TensorFlow before importing transformers
os.environ['USE_TF'] = '0'
os.environ['USE_TORCH'] = '1'
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
# Load from HuggingFace Hub
REPO_ID = "aurelius2023/xlm-roberta-algerian-misinformation"
print("="*70)
print("MODEL 3.2: XLM-RoBERTa Algerian Misinformation Detection")
print("="*70)
print("
Loading model from Hugging Face Hub...")
tokenizer = AutoTokenizer.from_pretrained(REPO_ID)
model = AutoModelForSequenceClassification.from_pretrained(REPO_ID)
print("✓ Model loaded successfully!")
print(f"✓ Model: {REPO_ID}")
print(f"✓ Base: xlm-roberta-base")
print(f"✓ Performance: F1=0.6733, Accuracy=0.7617")
# Label mappings
label_map = {0: 'F', 1: 'R', 2: 'N', 3: 'M', 4: 'S'}
label_names = {
'F': 'Fake',
'R': 'Real',
'N': 'Non-news',
'M': 'Misleading',
'S': 'Satire'
}
# Test with multiple examples
test_examples = [
"وزير الشباب الجزائري يكشف ان الدول الاوروبيه تطلب من الجزائر حلولًا لمعالجه المشكلات الاجتماعيه لشبابها",
"الجزائر فازت بكأس العالم 2024",
"الحكومة أعلنت عن إصلاحات جديدة في قطاع التعليم"
]
print("
" + "="*70)
print("TEST PREDICTIONS")
print("="*70)
for i, text in enumerate(test_examples, 1):
print(f"
--- Test {i} ---")
print(f"Text: {text[:80]}{'...' if len(text) > 80 else ''}")
# Tokenize
inputs = tokenizer(text, return_tensors="pt", max_length=128,
truncation=True, padding=True)
# Predict
with torch.no_grad():
outputs = model(**inputs)
probs = torch.softmax(outputs.logits, dim=1)[0]
pred = torch.argmax(probs).item()
confidence = probs[pred].item()
# Display results
predicted_label = label_map[pred]
predicted_name = label_names[predicted_label]
print(f"Predicted: {predicted_name} ({predicted_label})")
print(f"Confidence: {confidence:.2%}")
# Show all probabilities
print("All probabilities:")
for idx in range(5):
label = label_map[idx]
name = label_names[label]
prob = probs[idx].item()
bar = "█" * int(prob * 20)
print(f" {label} ({name:12s}): {prob:6.2%} {bar}")
print("
" + "="*70)
print("✅ ALL TESTS COMPLETED SUCCESSFULLY!")
print("="*70)
print("
📊 Model Performance Summary:")
print(" • Accuracy: 76.17%")
print(" • Macro F1: 67.33%")
print(" • Best classes: F (84.16%), R (77.69%), N (80.67%)")
print(" • Challenge: S class (30.12%) - limited training data")
print(f"
🔗 Model Card: https://huggingface.co/{REPO_ID}")
Class Imbalance Handling
- Weighted loss function using class weights
- Emphasis on minority classes (M, S)
Comparison with DziriBERT
| Metric | DziriBERT (3.1) | XLM-RoBERTa (3.2) |
|---|---|---|
| Macro F1 | 67.49% | 67.33% |
| Accuracy | 77.27% | 76.17% |
| F class F1 | 84.75% | 84.16% |
| M class F1 | 61.53% | 63.99% |
| S class F1 | 30.08% | 30.12% |
Acknowledgments
- Base model: xlm-roberta-base
- Dataset: Custom Algerian Darija misinformation dataset
- Framework: Hugging Face Transformers
Model Card Contact
For questions, issues, or collaboration opportunities, please open an issue on the model repository.
Last Updated: December 2025
- Downloads last month
- 7
Model tree for aurelius2023/xlm-roberta-algerian-misinformation
Base model
FacebookAI/xlm-roberta-baseEvaluation results
- Macro F1self-reported0.673
- Accuracyself-reported0.762