XLM-RoBERTa for Algerian Darija Misinformation Detection

Model Description

Fine-tuned XLM-RoBERTa-base model for detecting misinformation in Algerian Darija text. This model excels at handling code-switching between Arabic, French, and Latin scripts, making it particularly suitable for multilingual Algerian social media content.

Base Model: xlm-roberta-base

Model ID: 3.2

Task: Multi-class text classification (5 classes)

Classes

  • F (Fake): False or fabricated information
  • R (Real): Factual information or news reporting
  • N (Non-news): Non-informational content
  • M (Misleading): Partially true but misleading content
  • S (Satire): Satirical or humorous content

Performance

Metric Score
Accuracy 76.17%
Macro F1 67.33%
Macro Precision 67.45%
Macro Recall 67.94%

Per-Class Performance

Class Precision Recall F1-Score Support
F (Fake) 90.46% 78.68% 84.16% 952
R (Real) 76.97% 78.42% 77.69% 848
N (Non-news) 84.92% 76.83% 80.67% 872
M (Misleading) 56.52% 73.74% 63.99% 594
S (Satire) 28.41% 32.05% 30.12% 78

Usage

# test_model_3_2.py
import os

# CRITICAL: Disable TensorFlow before importing transformers
os.environ['USE_TF'] = '0'
os.environ['USE_TORCH'] = '1'

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

# Load from HuggingFace Hub
REPO_ID = "aurelius2023/xlm-roberta-algerian-misinformation"

print("="*70)
print("MODEL 3.2: XLM-RoBERTa Algerian Misinformation Detection")
print("="*70)

print("
Loading model from Hugging Face Hub...")
tokenizer = AutoTokenizer.from_pretrained(REPO_ID)
model = AutoModelForSequenceClassification.from_pretrained(REPO_ID)

print("✓ Model loaded successfully!")
print(f"✓ Model: {REPO_ID}")
print(f"✓ Base: xlm-roberta-base")
print(f"✓ Performance: F1=0.6733, Accuracy=0.7617")

# Label mappings
label_map = {0: 'F', 1: 'R', 2: 'N', 3: 'M', 4: 'S'}
label_names = {
    'F': 'Fake',
    'R': 'Real',
    'N': 'Non-news',
    'M': 'Misleading',
    'S': 'Satire'
}

# Test with multiple examples
test_examples = [
    "وزير الشباب الجزائري يكشف ان الدول الاوروبيه تطلب من الجزائر حلولًا لمعالجه المشكلات الاجتماعيه لشبابها",
    "الجزائر فازت بكأس العالم 2024",
    "الحكومة أعلنت عن إصلاحات جديدة في قطاع التعليم"
]

print("
" + "="*70)
print("TEST PREDICTIONS")
print("="*70)

for i, text in enumerate(test_examples, 1):
    print(f"
--- Test {i} ---")
    print(f"Text: {text[:80]}{'...' if len(text) > 80 else ''}")
    
    # Tokenize
    inputs = tokenizer(text, return_tensors="pt", max_length=128, 
                      truncation=True, padding=True)
    
    # Predict
    with torch.no_grad():
        outputs = model(**inputs)
        probs = torch.softmax(outputs.logits, dim=1)[0]
        pred = torch.argmax(probs).item()
        confidence = probs[pred].item()
    
    # Display results
    predicted_label = label_map[pred]
    predicted_name = label_names[predicted_label]
    
    print(f"Predicted: {predicted_name} ({predicted_label})")
    print(f"Confidence: {confidence:.2%}")
    
    # Show all probabilities
    print("All probabilities:")
    for idx in range(5):
        label = label_map[idx]
        name = label_names[label]
        prob = probs[idx].item()
        bar = "█" * int(prob * 20)
        print(f"  {label} ({name:12s}): {prob:6.2%} {bar}")

print("
" + "="*70)
print("✅ ALL TESTS COMPLETED SUCCESSFULLY!")
print("="*70)

print("
📊 Model Performance Summary:")
print("  • Accuracy: 76.17%")
print("  • Macro F1: 67.33%")
print("  • Best classes: F (84.16%), R (77.69%), N (80.67%)")
print("  • Challenge: S class (30.12%) - limited training data")

print(f"
🔗 Model Card: https://huggingface.co/{REPO_ID}")

Class Imbalance Handling

  • Weighted loss function using class weights
  • Emphasis on minority classes (M, S)

Comparison with DziriBERT

Metric DziriBERT (3.1) XLM-RoBERTa (3.2)
Macro F1 67.49% 67.33%
Accuracy 77.27% 76.17%
F class F1 84.75% 84.16%
M class F1 61.53% 63.99%
S class F1 30.08% 30.12%

Acknowledgments

  • Base model: xlm-roberta-base
  • Dataset: Custom Algerian Darija misinformation dataset
  • Framework: Hugging Face Transformers

Model Card Contact

For questions, issues, or collaboration opportunities, please open an issue on the model repository.


Last Updated: December 2025

Downloads last month
7
Safetensors
Model size
0.3B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for aurelius2023/xlm-roberta-algerian-misinformation

Finetuned
(3776)
this model

Evaluation results