Qwen2.5-7B PII Detection — Merged Model

Standalone merged model — no PEFT library required. Load and run directly with transformers.
For the lightweight adapter version (160MB vs 15GB): vineeth453/qwen25-7b-pii-detection-lora

LoRA adapter weights merged into Qwen/Qwen2.5-7B-Instruct after fine-tuning on ai4privacy/pii-masking-200k. Extracts 56 types of personally identifiable information across 4 languages and returns structured JSON output.

Built as the PII Detection component of a Phase 1 Input Guardrail gateway for an enterprise LLM security system.


Evaluation Results

Evaluated on 10,464 held-out samples (5% split from ai4privacy/pii-masking-200k).

Metric Score
Micro F1 0.967
Macro F1 0.961
Micro Precision 0.967
Micro Recall 0.968
Malformed JSON outputs 0 / 500 (0.0%)
Val Loss (final) 0.0033

Per-Entity F1 Scores

Label Precision Recall F1 Support
ACCOUNTNAME 1.000 1.000 1.000 28
ACCOUNTNUMBER 1.000 1.000 1.000 41
AGE 1.000 1.000 1.000 27
AMOUNT 1.000 1.000 1.000 36
BIC 1.000 1.000 1.000 7
BITCOINADDRESS 0.923 1.000 0.960 24
BUILDINGNUMBER 0.968 0.968 0.968 31
CITY 1.000 0.963 0.981 27
COMPANYNAME 1.000 1.000 1.000 35
COUNTY 1.000 1.000 1.000 29
CREDITCARDCVV 1.000 1.000 1.000 10
CREDITCARDISSUER 1.000 1.000 1.000 16
CREDITCARDNUMBER 0.829 0.935 0.879 31
CURRENCY 0.909 0.870 0.889 23
CURRENCYCODE 1.000 1.000 1.000 8
CURRENCYNAME 0.667 0.750 0.706 8
CURRENCYSYMBOL 1.000 1.000 1.000 20
DATE 0.884 0.974 0.927 39
DOB 0.955 0.808 0.875 26
EMAIL 1.000 1.000 1.000 42
ETHEREUMADDRESS 1.000 1.000 1.000 11
EYECOLOR 1.000 1.000 1.000 10
FIRSTNAME 0.994 0.994 0.994 158
GENDER 1.000 1.000 1.000 35
HEIGHT 1.000 1.000 1.000 7
IBAN 1.000 1.000 1.000 29
IP 0.727 0.267 0.390 30
IPV4 0.732 0.909 0.811 33
IPV6 0.711 1.000 0.831 27
JOBAREA 1.000 1.000 1.000 40
JOBTITLE 1.000 1.000 1.000 37
JOBTYPE 1.000 1.000 1.000 31
LASTNAME 1.000 1.000 1.000 47
LITECOINADDRESS 1.000 0.714 0.833 7
MAC 1.000 1.000 1.000 12
MASKEDNUMBER 0.923 0.800 0.857 30
MIDDLENAME 0.944 1.000 0.971 34
NEARBYGPSCOORDINATE 1.000 1.000 1.000 17
ORDINALDIRECTION 1.000 1.000 1.000 17
PASSWORD 1.000 1.000 1.000 31
PHONEIMEI 1.000 1.000 1.000 19
PHONENUMBER 1.000 1.000 1.000 21
PIN 1.000 1.000 1.000 6
PREFIX 1.000 1.000 1.000 29
SECONDARYADDRESS 1.000 1.000 1.000 31
SEX 1.000 1.000 1.000 26
SSN 1.000 1.000 1.000 16
STATE 1.000 1.000 1.000 31
STREET 1.000 1.000 1.000 39
TIME 1.000 1.000 1.000 20
URL 1.000 1.000 1.000 29
USERAGENT 1.000 1.000 1.000 33
USERNAME 1.000 1.000 1.000 30
VEHICLEVIN 1.000 1.000 1.000 13
VEHICLEVRM 1.000 1.000 1.000 15
ZIPCODE 0.970 0.970 0.970 33

Note on IP label (F1=0.390): The dataset contains three overlapping IP labels (IP, IPV4, IPV6). The low recall on IP is due to the model correctly identifying the address but tagging it as IPV4 or IPV6 — a label ambiguity in the dataset, not a detection failure.


How to Get Started

Installation

pip install transformers accelerate torch
# No PEFT required for this merged model

Load and Run Inference

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch, json

model = AutoModelForCausalLM.from_pretrained(
    "vineeth453/qwen25-7b-pii-detection",
    device_map="auto",
    torch_dtype=torch.bfloat16,   # use bfloat16 for A100/H100; float16 for older GPUs
)
tokenizer = AutoTokenizer.from_pretrained("vineeth453/qwen25-7b-pii-detection")
model.eval()

def detect_pii(text: str) -> dict:
    prompt = (
        "<|im_start|>system\n"
        "You are a PII detection system. Extract all personally identifiable information.\n"
        'Return ONLY valid JSON: {"entities":[{"text":"...","label":"..."}]}\n'
        "<|im_end|>\n"
        "<|im_start|>user\n"
        f"{text}\n"
        "<|im_end|>\n"
        "<|im_start|>assistant\n"
    )
    inputs = tokenizer(prompt, return_tensors="pt", add_special_tokens=False).to(model.device)
    with torch.no_grad():
        out = model.generate(
            **inputs,
            max_new_tokens=200,
            do_sample=False,
            pad_token_id=tokenizer.eos_token_id
        )
    response = tokenizer.decode(
        out[0][inputs["input_ids"].shape[1]:],
        skip_special_tokens=True
    ).strip().replace("<|im_end|>", "")
    return json.loads(response)

# English
print(detect_pii("Contact John Smith at john@example.com or call +1-555-867-5309"))
# {"entities": [{"text": "John", "label": "FIRSTNAME"}, {"text": "Smith", "label": "LASTNAME"},
#               {"text": "john@example.com", "label": "EMAIL"}, {"text": "+1-555-867-5309", "label": "PHONENUMBER"}]}

# German
print(detect_pii("Patient Lena Müller, born 14.03.1987, lives at Hauptstraße 22, Berlin."))
# {"entities": [{"text": "Lena", "label": "FIRSTNAME"}, {"text": "Müller", "label": "LASTNAME"},
#               {"text": "14.03.1987", "label": "DOB"}, {"text": "Hauptstraße", "label": "STREET"},
#               {"text": "22", "label": "BUILDINGNUMBER"}, {"text": "Berlin", "label": "STATE"}]}

# French
print(detect_pii("Merci de contacter Marie Dupont à marie.dupont@societe.fr avant le 30 mars."))

Memory Requirements

Precision VRAM Required
bfloat16 (default) ~15GB
4-bit quantized (use adapter repo instead) ~5GB

For GPU-constrained environments, use the adapter version with 4-bit quantization instead.


Training Details

Training Data

  • Dataset: ai4privacy/pii-masking-200k
  • Size: 209,261 samples (198,797 train / 10,464 val, 95/5 split)
  • Languages: English (43k), French (62k), German (53k), Italian (51k)
  • Entity types: 56 PII categories

Training Hyperparameters

Parameter Value
Base model Qwen/Qwen2.5-7B-Instruct
Method QLoRA → merged
Quantization during training 4-bit NF4 + double quantization
Compute dtype bfloat16
LoRA rank (r) 16
LoRA alpha 32
LoRA dropout 0.05
LoRA target modules q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Trainable parameters 40,370,176 (0.53% of 7.6B)
Epochs 1
Per-device batch size 4
Gradient accumulation 8 (effective batch = 32)
Learning rate 2e-4
LR scheduler Cosine decay
Warmup steps 186
Weight decay 0.01
Optimizer paged_adamw_8bit
Max sequence length 512
Max grad norm 1.0
Hardware NVIDIA A100 40GB
Training time 10.7 hours
Final train loss 0.00517
Best val loss 0.00330

Uses

Direct Use

Enterprise input guardrail systems for detecting and redacting PII from user queries before they reach an LLM. Suitable for HR, legal, healthcare, and financial applications.

Downstream Use

  • PII redaction pipelines
  • Compliance auditing tools (GDPR, CCPA, HIPAA)
  • Data anonymization workflows
  • Pre-processing layer in RAG or LLM gateway systems

Out-of-Scope Use

  • Real-time very high-throughput inference without GPU (15GB model, CPU too slow)
  • Languages outside EN/FR/DE/IT without further fine-tuning
  • Should not be the sole PII detection mechanism in high-stakes settings without human review

Bias, Risks, and Limitations

  • IP label ambiguity: Model occasionally routes IP addresses to IPV4/IPV6 instead of IP. Post-processing regex validation recommended.
  • CREDITCARDNUMBER vs PHONEIMEI: 16-digit numeric strings can be confused between these labels (F1=0.879). Luhn algorithm post-processing can mitigate this.
  • Low-support labels: Labels with <10 samples (e.g., CURRENCYNAME) have less reliable F1 estimates.
  • Language coverage: EN/FR/DE/IT only. Other languages may degrade.
  • Merged model note: LoRA weights are merged into bf16 base weights. This model is ~15GB and does not support 4-bit quantization natively — use the adapter repo for memory-constrained inference.

Environmental Impact

  • Hardware: NVIDIA A100 40GB (Google Colab Pro)
  • Training time: ~10.7 hours
  • Cloud provider: Google Cloud (Colab)
  • Compute region: US

Model Card Authors

Vineeth — Masters project, Enterprise Guardrails System
Adapter repo: vineeth453/qwen25-7b-pii-detection-lora

Downloads last month
9
Safetensors
Model size
8B params
Tensor type
F32
·
U8
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for vineeth453/qwen25-7b-pii-detection

Base model

Qwen/Qwen2.5-7B
Quantized
(313)
this model

Dataset used to train vineeth453/qwen25-7b-pii-detection