You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Model Card for Vijil Prompt Injection

Model Details

Model Description

This model is a fine-tuned version of ModernBert to classify prompt-injection prompts which can manipulate language models into producing unintended outputs.

  • Developed by: Vijil AI
  • License: apache-2.0
  • Finetuned version of ModernBERT

Uses

Prompt injection attacks manipulate language models by inserting or altering prompts to trigger harmful or unintended responses. The vijil/mbert-prompt-injection model is designed to enhance security in language model applications by detecting prompt-injection attacks.

How to Get Started with the Model

from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline
import torch

tokenizer = AutoTokenizer.from_pretrained("answerdotai/ModernBERT-base") 
model = AutoModelForSequenceClassification.from_pretrained("vijil/mbert-prompt-injection")

classifier = pipeline(
  "text-classification",
  model=model,
  tokenizer=tokenizer,
  truncation=True,
  max_length=512,
  device=torch.device("cuda" if torch.cuda.is_available() else "cpu"),
)

print(classifier("this is a prompt-injection prompt"))

Training Details

Training Data

The dataset used for training the model was taken from

wildguardmix/train and safe-guard-prompt-injection/train

Training Procedure

Supervised finetuning with above dataset

Training Hyperparameters

  • learning_rate: 5e-05

  • train_batch_size: 32

  • eval_batch_size: 32

  • optimizer: adamw_torch_fused

  • lr_scheduler_type: cosine_with_restarts

  • warmup_ratio: 0.1

  • num_epochs: 3

Evaluation

  • Training Loss: 0.0036

  • Validation Loss: 0.209392

  • Accuracy: 0.961538

  • Precision: 0.958362

  • Recall: 0.957055

  • Fl: 0.957708

Testing Data

The dataset used for training the model was taken from

wildguardmix/test and safe-guard-prompt-injection/test

Results

Model Card Contact

https://vijil.ai

Downloads last month
2,020
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support