NSPL DeBERTa Prompt Injection Detector v1

Fine-tuned protectai/deberta-v3-base-prompt-injection-v2 on deepset/prompt-injections for improved recall on this benchmark.

Performance

Metric Pretrained Fine-tuned (this model)
Accuracy 67.2% 89.7%
Precision 100.0% 98.0%
Recall 36.7% 81.7%
F1 53.7% 89.1%

Cross-dataset (Ensemble with this model)

Dataset Recall F1
deepset test (n=116) 88.3% 92.2%
Jailbreak prompts (n=79) 100% 100%
jackhhao (n=1044) 98.7% 80.5%
xTRam1 (n=2000) 96.2% 81.3%
Lakera CTF (n=1000) 76.6% 86.8%

Usage

from nspl.guards.classifier import ClassifierDetector

detector = ClassifierDetector(
    model_name="astoreyai/nspl-deberta-injection-v1",
    threshold=0.5,
)
result = detector.detect("Ignore all previous instructions")
print(result.triggered)  # True
print(result.score)      # 0.98

Or standalone with transformers:

from transformers import pipeline

classifier = pipeline("text-classification", model="astoreyai/nspl-deberta-injection-v1")
result = classifier("Ignore all previous instructions")
# [{'label': 'INJECTION', 'score': 0.98}]

Training

  • Base model: protectai/deberta-v3-base-prompt-injection-v2
  • Dataset: deepset/prompt-injections (546 train, 116 test)
  • Epochs: 3
  • Learning rate: 2e-5
  • Batch size: 16
  • Training time: 56 seconds on CPU
  • Framework: Hugging Face Transformers + Trainer

Part of NSPL

This model is part of NSPL (Non-Stochastic Protection Layer), a Python framework for deterministic LLM safety verification.

Citation

@software{storey2026nspl,
  author = {Storey, Aaron and McCardle, John},
  title = {NSPL: Non-Stochastic Protection Layer for Agentic AI},
  year = {2026},
  url = {https://github.com/astoreyai/nspl}
}
Downloads last month
39
Safetensors
Model size
0.2B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for astoreyai/nspl-deberta-injection-v1

Dataset used to train astoreyai/nspl-deberta-injection-v1

Evaluation results