ModernFinBERT

EU PII Safeguard

Multilingual PII Detection Model for European Languages

A state-of-the-art multilingual model for detecting Personally Identifiable Information (PII) across 26 European languages (all EU official languages). It is designed for GDPR compliance, privacy-preserving AI applications, and secure handling of sensitive data in multilingual settings. This model enables enterprises, researchers, and data protection teams to identify and safeguard PII with high accuracy (≈98%) across diverse European contexts.

🎯 Model Performance

  • Global F1 Score: 97.02%
  • 26 Languages Supported
  • 42 PII Entity Types
  • Consistent 95%+ F1 across all languages

🌍 Supported Languages

🇧🇬 Bulgarian • 🇨🇿 Czech • 🇩🇰 Danish • 🇩🇪 German • 🇬🇷 Greek • 🇬🇧 English • 🇪🇸 Spanish • 🇪🇪 Estonian • 🇫🇮 Finnish • 🇫🇷 French • 🇮🇪 Irish • 🇭🇷 Croatian • 🇭🇺 Hungarian • 🇮🇹 Italian • 🇱🇹 Lithuanian • 🇱🇻 Latvian • 🇲🇹 Maltese • 🇳🇱 Dutch • 🇵🇱 Polish • 🇵🇹 Portuguese • 🇷🇴 Romanian • 🇷🇺 Russian • 🇸🇰 Slovak • 🇸🇮 Slovenian • 🇸🇪 Swedish • 🇺🇦 Ukrainian

🔍 Detected PII Types

  • Personal: First/Last/Middle Names, Age, Gender, Ethnicity
  • Contact: Email, Phone, Address, City, Country, Postal Code
  • Financial: Credit Card, IBAN, Account Numbers, Salary
  • Identity: National ID, Passport, Driver License, Tax ID
  • Health: Medical Conditions, Health Insurance ID
  • Digital: IP Address, MAC Address, URL, Username, Password
  • And more: 42 total entity types

🚀 Quick Start

from transformers import AutoTokenizer, AutoModelForTokenClassification
import torch

# Load model and tokenizer
model_name = "tabularisai/eu-pii-safeguard"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForTokenClassification.from_pretrained(model_name)

# Example text (French)
text = "Bonjour, je suis Marie Dubois, email: [email protected]"

# Tokenize and predict
inputs = tokenizer(text, return_tensors="pt", truncation=True)
with torch.no_grad():
    outputs = model(**inputs)
    predictions = torch.argmax(outputs.logits, dim=-1)

# Get predictions
tokens = tokenizer.convert_ids_to_tokens(inputs["input_ids"][0])
predicted_labels = [model.config.id2label[pred.item()] for pred in predictions[0]]

print("Detected PII:")
for token, label in zip(tokens, predicted_labels):
    if label != "O":
        print(f"  {label}: {token}")

📊 Performance by Language

Language F1 Score Language F1 Score
Irish (ga) 97.98% Dutch (nl) 97.24%
Bulgarian (bg) 97.80% Slovak (sk) 97.21%
Italian (it) 97.68% Swedish (sv) 97.09%
Portuguese (pt) 97.61% Russian (ru) 97.04%
Slovenian (sl) 97.51% Croatian (hr) 96.93%
Czech (cs) 97.51% Polish (pl) 96.63%
Hungarian (hu) 97.50% French (fr) 96.59%
Estonian (et) 97.41% Romanian (ro) 96.54%
Latvian (lv) 97.40% Danish (da) 96.36%
English (en) 97.36% German (de) 96.22%
Spanish (es) 97.34% Ukrainian (uk) 96.09%
Finnish (fi) 97.30% Maltese (mt) 95.78%
Lithuanian (lt) 97.24% Greek (el) 95.42%

💼 Use Cases

  • 🔒 Data Privacy: Automatically detect and anonymize PII before processing
  • ⚖️ GDPR Compliance: Ensure regulatory compliance across EU markets
  • 🛡️ Security: Prevent data breaches by identifying sensitive information
  • 📊 Data Governance: Audit and catalog personal data in multilingual datasets

🏗️ Model Architecture

  • Base Model: XLM-RoBERTa-large
  • Task: Token Classification
  • Labels: 74 (B-/I- format for 42 entity types)
  • Max Length: 256 tokens

🔄 Community Feedback

We're actively seeking feedback from the community! Please:

  • 🐛 Report issues or edge cases
  • 💡 Suggest improvements
  • 🧪 Share your use cases and results
  • 📊 Contribute evaluation on new datasets

🏢 About Tabularis AI

Developed by Tabularis AI - Building privacy-preserving AI solutions for enterprise data protection.


For questions, collaborations, or licensing inquiries: [email protected]

Downloads last month
695
Safetensors
Model size
0.6B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for tabularisai/eu-pii-safeguard

Finetuned
(897)
this model

Evaluation results