Qwen2.5-7B PII Detection — LoRA Adapter

Adapter-only repo — requires Qwen/Qwen2.5-7B-Instruct as base model + PEFT library.
For a standalone model (no PEFT needed): vineeth453/qwen25-7b-pii-detection

Fine-tuned from Qwen/Qwen2.5-7B-Instruct using QLoRA on the ai4privacy/pii-masking-200k dataset. Extracts 56 types of personally identifiable information across 4 languages (English, French, German, Italian) and returns structured JSON output.

Built as the PII Detection component of a Phase 1 Input Guardrail gateway for an enterprise LLM security system.

Evaluation Results

Evaluated on 10,464 held-out samples (5% split from ai4privacy/pii-masking-200k).

Metric	Score
Micro F1	0.967
Macro F1	0.961
Micro Precision	0.967
Micro Recall	0.968
Malformed JSON outputs	0 / 500 (0.0%)
Val Loss (final)	0.0033

Per-Entity F1 Scores

Label	Precision	Recall	F1	Support
ACCOUNTNAME	1.000	1.000	1.000	28
ACCOUNTNUMBER	1.000	1.000	1.000	41
AGE	1.000	1.000	1.000	27
AMOUNT	1.000	1.000	1.000	36
BIC	1.000	1.000	1.000	7
BITCOINADDRESS	0.923	1.000	0.960	24
BUILDINGNUMBER	0.968	0.968	0.968	31
CITY	1.000	0.963	0.981	27
COMPANYNAME	1.000	1.000	1.000	35
COUNTY	1.000	1.000	1.000	29
CREDITCARDCVV	1.000	1.000	1.000	10
CREDITCARDISSUER	1.000	1.000	1.000	16
CREDITCARDNUMBER	0.829	0.935	0.879	31
CURRENCY	0.909	0.870	0.889	23
CURRENCYCODE	1.000	1.000	1.000	8
CURRENCYNAME	0.667	0.750	0.706	8
CURRENCYSYMBOL	1.000	1.000	1.000	20
DATE	0.884	0.974	0.927	39
DOB	0.955	0.808	0.875	26
EMAIL	1.000	1.000	1.000	42
ETHEREUMADDRESS	1.000	1.000	1.000	11
EYECOLOR	1.000	1.000	1.000	10
FIRSTNAME	0.994	0.994	0.994	158
GENDER	1.000	1.000	1.000	35
HEIGHT	1.000	1.000	1.000	7
IBAN	1.000	1.000	1.000	29
IP	0.727	0.267	0.390	30
IPV4	0.732	0.909	0.811	33
IPV6	0.711	1.000	0.831	27
JOBAREA	1.000	1.000	1.000	40
JOBTITLE	1.000	1.000	1.000	37
JOBTYPE	1.000	1.000	1.000	31
LASTNAME	1.000	1.000	1.000	47
LITECOINADDRESS	1.000	0.714	0.833	7
MAC	1.000	1.000	1.000	12
MASKEDNUMBER	0.923	0.800	0.857	30
MIDDLENAME	0.944	1.000	0.971	34
NEARBYGPSCOORDINATE	1.000	1.000	1.000	17
ORDINALDIRECTION	1.000	1.000	1.000	17
PASSWORD	1.000	1.000	1.000	31
PHONEIMEI	1.000	1.000	1.000	19
PHONENUMBER	1.000	1.000	1.000	21
PIN	1.000	1.000	1.000	6
PREFIX	1.000	1.000	1.000	29
SECONDARYADDRESS	1.000	1.000	1.000	31
SEX	1.000	1.000	1.000	26
SSN	1.000	1.000	1.000	16
STATE	1.000	1.000	1.000	31
STREET	1.000	1.000	1.000	39
TIME	1.000	1.000	1.000	20
URL	1.000	1.000	1.000	29
USERAGENT	1.000	1.000	1.000	33
USERNAME	1.000	1.000	1.000	30
VEHICLEVIN	1.000	1.000	1.000	13
VEHICLEVRM	1.000	1.000	1.000	15
ZIPCODE	0.970	0.970	0.970	33

Note on IP label (F1=0.390): The dataset contains three overlapping IP labels (IP, IPV4, IPV6). The low recall on IP is due to the model correctly identifying the address but tagging it as IPV4 or IPV6 — a label ambiguity in the dataset, not a detection failure. Combined IP recall across all three labels is >0.95.

How to Get Started

Installation

pip install transformers peft bitsandbytes accelerate torch

Load and Run Inference

from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from peft import PeftModel
import torch, json

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=True,
    bnb_4bit_compute_dtype=torch.bfloat16
)

base_model = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen2.5-7B-Instruct",
    quantization_config=bnb_config,
    device_map="auto",
    dtype=torch.bfloat16,
)
model = PeftModel.from_pretrained(base_model, "vineeth453/qwen25-7b-pii-detection-lora")
tokenizer = AutoTokenizer.from_pretrained("vineeth453/qwen25-7b-pii-detection-lora")
model.eval()

def detect_pii(text: str) -> dict:
    prompt = (
        "<|im_start|>system\n"
        "You are a PII detection system. Extract all personally identifiable information.\n"
        'Return ONLY valid JSON: {"entities":[{"text":"...","label":"..."}]}\n'
        "<|im_end|>\n"
        "<|im_start|>user\n"
        f"{text}\n"
        "<|im_end|>\n"
        "<|im_start|>assistant\n"
    )
    inputs = tokenizer(prompt, return_tensors="pt", add_special_tokens=False).to(model.device)
    with torch.no_grad():
        out = model.generate(
            **inputs,
            max_new_tokens=200,
            do_sample=False,
            pad_token_id=tokenizer.eos_token_id
        )
    response = tokenizer.decode(
        out[0][inputs["input_ids"].shape[1]:],
        skip_special_tokens=True
    ).strip().replace("<|im_end|>", "")
    return json.loads(response)

# English
print(detect_pii("Contact John Smith at [email protected] or call +1-555-867-5309"))
# {"entities": [{"text": "John", "label": "FIRSTNAME"}, {"text": "Smith", "label": "LASTNAME"},
#               {"text": "[email protected]", "label": "EMAIL"}, {"text": "+1-555-867-5309", "label": "PHONENUMBER"}]}

# German
print(detect_pii("Patient Lena Müller, born 14.03.1987, lives at Hauptstraße 22, Berlin."))
# {"entities": [{"text": "Lena", "label": "FIRSTNAME"}, {"text": "Müller", "label": "LASTNAME"},
#               {"text": "14.03.1987", "label": "DOB"}, {"text": "Hauptstraße", "label": "STREET"},
#               {"text": "22", "label": "BUILDINGNUMBER"}, {"text": "Berlin", "label": "STATE"}]}

Training Details

Training Data

Dataset: ai4privacy/pii-masking-200k
Size: 209,261 samples (198,797 train / 10,464 val, 95/5 split)
Languages: English (43k), French (62k), German (53k), Italian (51k)
Entity types: 56 PII categories

Training Hyperparameters

Parameter	Value
Base model	Qwen/Qwen2.5-7B-Instruct
Method	QLoRA
Quantization	4-bit NF4 + double quantization
Compute dtype	bfloat16
LoRA rank (r)	16
LoRA alpha	32
LoRA dropout	0.05
LoRA target modules	q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Trainable parameters	40,370,176 (0.53% of 7.6B)
Epochs	1
Per-device batch size	4
Gradient accumulation	8 (effective batch = 32)
Learning rate	2e-4
LR scheduler	Cosine decay
Warmup steps	186
Weight decay	0.01
Optimizer	paged_adamw_8bit
Max sequence length	512
Max grad norm	1.0
Hardware	NVIDIA A100 40GB
Training time	10.7 hours
Final train loss	0.00517
Best val loss	0.00330

Framework

transformers==5.3.0
peft
bitsandbytes
accelerate

Uses

Direct Use

Enterprise input guardrail systems for detecting and redacting PII from user queries before they reach an LLM. Suitable for HR, legal, healthcare, and financial applications where PII leakage into LLM prompts is a compliance risk.

Downstream Use

PII redaction pipelines
Compliance auditing tools
Data anonymization workflows
GDPR / CCPA compliance enforcement

Out-of-Scope Use

Real-time inference at very high throughput without batching (7B model latency)
Domains with highly specialized PII formats not covered by the training data
Should not be used as the sole PII detection mechanism in high-stakes medical or legal settings without human review

Bias, Risks, and Limitations

IP label ambiguity: The model occasionally routes bare IP addresses to IPV4 or IPV6 instead of IP due to overlapping labels in the training data. Post-processing regex validation is recommended for IP-type entities.
CREDITCARDNUMBER vs PHONEIMEI: 16-digit numeric strings without formatting context can be misclassified between these two labels (F1=0.879 for CREDITCARDNUMBER). Format-based post-processing (Luhn check) can mitigate this.
Low-support labels: Labels with fewer than 10 training examples (e.g., CURRENCYNAME with support=8) have less reliable F1 estimates.
Language coverage: Trained on EN/FR/DE/IT only. Other languages may degrade performance.

Environmental Impact

Hardware: NVIDIA A100 40GB (Google Colab Pro)
Training time: ~10.7 hours
Cloud provider: Google Cloud (Colab)
Compute region: US

Model Card Authors

Vineeth — Masters project, Enterprise Guardrails System

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for vineeth453/qwen25-7b-pii-detection-lora

Base model

Qwen/Qwen2.5-7B

Finetuned

Qwen/Qwen2.5-7B-Instruct

Adapter

(1981)

this model

vineeth453
/

qwen25-7b-pii-detection-lora