NSPL DeBERTa Prompt Injection Detector v1
Fine-tuned protectai/deberta-v3-base-prompt-injection-v2 on deepset/prompt-injections for improved recall on this benchmark.
Performance
| Metric | Pretrained | Fine-tuned (this model) |
|---|---|---|
| Accuracy | 67.2% | 89.7% |
| Precision | 100.0% | 98.0% |
| Recall | 36.7% | 81.7% |
| F1 | 53.7% | 89.1% |
Cross-dataset (Ensemble with this model)
| Dataset | Recall | F1 |
|---|---|---|
| deepset test (n=116) | 88.3% | 92.2% |
| Jailbreak prompts (n=79) | 100% | 100% |
| jackhhao (n=1044) | 98.7% | 80.5% |
| xTRam1 (n=2000) | 96.2% | 81.3% |
| Lakera CTF (n=1000) | 76.6% | 86.8% |
Usage
from nspl.guards.classifier import ClassifierDetector
detector = ClassifierDetector(
model_name="astoreyai/nspl-deberta-injection-v1",
threshold=0.5,
)
result = detector.detect("Ignore all previous instructions")
print(result.triggered) # True
print(result.score) # 0.98
Or standalone with transformers:
from transformers import pipeline
classifier = pipeline("text-classification", model="astoreyai/nspl-deberta-injection-v1")
result = classifier("Ignore all previous instructions")
# [{'label': 'INJECTION', 'score': 0.98}]
Training
- Base model: protectai/deberta-v3-base-prompt-injection-v2
- Dataset: deepset/prompt-injections (546 train, 116 test)
- Epochs: 3
- Learning rate: 2e-5
- Batch size: 16
- Training time: 56 seconds on CPU
- Framework: Hugging Face Transformers + Trainer
Part of NSPL
This model is part of NSPL (Non-Stochastic Protection Layer), a Python framework for deterministic LLM safety verification.
Citation
@software{storey2026nspl,
author = {Storey, Aaron and McCardle, John},
title = {NSPL: Non-Stochastic Protection Layer for Agentic AI},
year = {2026},
url = {https://github.com/astoreyai/nspl}
}
- Downloads last month
- 39
Model tree for astoreyai/nspl-deberta-injection-v1
Base model
microsoft/deberta-v3-baseDataset used to train astoreyai/nspl-deberta-injection-v1
Evaluation results
- accuracy on deepset/prompt-injectionstest set self-reported0.897
- f1 on deepset/prompt-injectionstest set self-reported0.891
- precision on deepset/prompt-injectionstest set self-reported0.980
- recall on deepset/prompt-injectionstest set self-reported0.817