XLM-RoBERTa for Algerian Darija Misinformation Detection

Model Description

Fine-tuned XLM-RoBERTa-base model for detecting misinformation in Algerian Darija text. This model excels at handling code-switching between Arabic, French, and Latin scripts, making it particularly suitable for multilingual Algerian social media content.

Base Model: xlm-roberta-base

Model ID: 3.2

Task: Multi-class text classification (5 classes)

Classes

F (Fake): False or fabricated information
R (Real): Factual information or news reporting
N (Non-news): Non-informational content
M (Misleading): Partially true but misleading content
S (Satire): Satirical or humorous content

Performance

Metric	Score
Accuracy	76.17%
Macro F1	67.33%
Macro Precision	67.45%
Macro Recall	67.94%

Per-Class Performance

Class	Precision	Recall	F1-Score	Support
F (Fake)	90.46%	78.68%	84.16%	952
R (Real)	76.97%	78.42%	77.69%	848
N (Non-news)	84.92%	76.83%	80.67%	872
M (Misleading)	56.52%	73.74%	63.99%	594
S (Satire)	28.41%	32.05%	30.12%	78

Usage

# test_model_3_2.py
import os

# CRITICAL: Disable TensorFlow before importing transformers
os.environ['USE_TF'] = '0'
os.environ['USE_TORCH'] = '1'

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

# Load from HuggingFace Hub
REPO_ID = "aurelius2023/xlm-roberta-algerian-misinformation"

print("="*70)
print("MODEL 3.2: XLM-RoBERTa Algerian Misinformation Detection")
print("="*70)

print("
Loading model from Hugging Face Hub...")
tokenizer = AutoTokenizer.from_pretrained(REPO_ID)
model = AutoModelForSequenceClassification.from_pretrained(REPO_ID)

print("✓ Model loaded successfully!")
print(f"✓ Model: {REPO_ID}")
print(f"✓ Base: xlm-roberta-base")
print(f"✓ Performance: F1=0.6733, Accuracy=0.7617")

# Label mappings
label_map = {0: 'F', 1: 'R', 2: 'N', 3: 'M', 4: 'S'}
label_names = {
    'F': 'Fake',
    'R': 'Real',
    'N': 'Non-news',
    'M': 'Misleading',
    'S': 'Satire'
}

# Test with multiple examples
test_examples = [
    "وزير الشباب الجزائري يكشف ان الدول الاوروبيه تطلب من الجزائر حلولًا لمعالجه المشكلات الاجتماعيه لشبابها",
    "الجزائر فازت بكأس العالم 2024",
    "الحكومة أعلنت عن إصلاحات جديدة في قطاع التعليم"
]

print("
" + "="*70)
print("TEST PREDICTIONS")
print("="*70)

for i, text in enumerate(test_examples, 1):
    print(f"
--- Test {i} ---")
    print(f"Text: {text[:80]}{'...' if len(text) > 80 else ''}")
    
    # Tokenize
    inputs = tokenizer(text, return_tensors="pt", max_length=128, 
                      truncation=True, padding=True)
    
    # Predict
    with torch.no_grad():
        outputs = model(**inputs)
        probs = torch.softmax(outputs.logits, dim=1)[0]
        pred = torch.argmax(probs).item()
        confidence = probs[pred].item()
    
    # Display results
    predicted_label = label_map[pred]
    predicted_name = label_names[predicted_label]
    
    print(f"Predicted: {predicted_name} ({predicted_label})")
    print(f"Confidence: {confidence:.2%}")
    
    # Show all probabilities
    print("All probabilities:")
    for idx in range(5):
        label = label_map[idx]
        name = label_names[label]
        prob = probs[idx].item()
        bar = "█" * int(prob * 20)
        print(f"  {label} ({name:12s}): {prob:6.2%} {bar}")

print("
" + "="*70)
print("✅ ALL TESTS COMPLETED SUCCESSFULLY!")
print("="*70)

print("
📊 Model Performance Summary:")
print("  • Accuracy: 76.17%")
print("  • Macro F1: 67.33%")
print("  • Best classes: F (84.16%), R (77.69%), N (80.67%)")
print("  • Challenge: S class (30.12%) - limited training data")

print(f"
🔗 Model Card: https://huggingface.co/{REPO_ID}")

Class Imbalance Handling

Weighted loss function using class weights
Emphasis on minority classes (M, S)

Comparison with DziriBERT

Metric	DziriBERT (3.1)	XLM-RoBERTa (3.2)
Macro F1	67.49%	67.33%
Accuracy	77.27%	76.17%
F class F1	84.75%	84.16%
M class F1	61.53%	63.99%
S class F1	30.08%	30.12%

Acknowledgments

Base model: xlm-roberta-base
Dataset: Custom Algerian Darija misinformation dataset
Framework: Hugging Face Transformers

Model Card Contact

For questions, issues, or collaboration opportunities, please open an issue on the model repository.

Last Updated: December 2025

Downloads last month: 7

Safetensors

Model size

0.3B params

Tensor type

F32

Model tree for aurelius2023/xlm-roberta-algerian-misinformation

Base model

FacebookAI/xlm-roberta-base

Finetuned

(3776)

this model

Evaluation results

Macro F1
self-reported

0.673
Accuracy
self-reported

0.762