JuaKazi Swahili Gender Bias Classifier v3

Fine-tuned afro-xlmr-base for binary gender bias detection in Swahili text.

Part of the JuaKazi Gender Sensitization Engine.

Validation Metrics (v3)

Metric Value
BIAS Precision 0.898
BIAS Recall 0.910
BIAS F1 0.904
Decision threshold 0.50
Train size 45,628
Val size 8,052
Ground truth ground_truth_sw_v5.csv (64,723 rows)

Why v3? β€” Retraining Rationale

Independent evaluation (Mar 2026) on a 9,709-sample labeled test set showed v2 had a critical precision problem:

Metric v2 v3
BIAS Precision 0.330 0.898
BIAS Recall 0.976 0.910
BIAS F1 0.493 0.904
False Positives 333 46

Root cause (v2): pos_weight β‰ˆ 58x + only 4K neutral training rows β†’ model flagged gendered terms (wa kike, mama, wanawake) as bias regardless of context.

v3 fixes:

  • pos_weight = 10 (hard cap, down from ~58x)
  • neutral_ratio = 40 β€” 40K neutral rows in training (was 4K)
  • Counter-stereotype rows labelled NEUTRAL during training
  • annotation_error rows excluded from training

Usage

from transformers import pipeline
pipe = pipeline("text-classification", model="juakazike/sw-bias-classifier-v3")
pipe("Wanawake hawafai kuwa viongozi")
# [{"label": "BIAS", "score": 0.97}]


## Usage

```python
from transformers import pipeline

pipe = pipeline(
    "text-classification",
    model="juakazike/sw-bias-classifier-v3",
    truncation=True,
    max_length=128,
)

pipe("Wanawake hawafai kuwa viongozi wa nchi")
# [{"label": "BIAS", "score": 0.97}]

pipe("Daktari wa kike alitibu wagonjwa hospitalini")
# [{"label": "NEUTRAL", "score": 0.83}]
Recommended decision threshold: 0.50

Limitations
Trained on Kenyan and Tanzanian Swahili β€” Sheng and Ugandan Swahili coverage is limited
Implicit and proverbial bias patterns may still be missed
For production use, combine with the JuaKazi rules-based detection layer
Cohen's Kappa inter-annotator agreement: measurement in progress
Citation
JuaKazi Gender Sensitization Engine β€” AI BRIDGE / AfriLabs, March 2026.

Downloads last month
13
Safetensors
Model size
0.3B params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for juakazike/sw-bias-classifier-v3

Finetuned
(72)
this model

Dataset used to train juakazike/sw-bias-classifier-v3

Space using juakazike/sw-bias-classifier-v3 1