SMS Spam Detection with BERT

🎯 A high-performance SMS spam classifier built with BERT achieving 99.16% accuracy.

Model Description

This model is a fine-tuned BERT classifier designed to detect spam messages in SMS text. It can classify messages as either:

  • HAM (legitimate message)
  • SPAM (unwanted/spam message)

Performance Metrics

Metric Score
Accuracy 99.16%
Precision 97.30%
Recall 96.43%
F1-Score 96.86%

Quick Start

Using Transformers Pipeline

from transformers import pipeline

# Load the model
classifier = pipeline("text-classification", model="niru-nny/SMS_Spam_Detection")

# Classify a message
result = classifier("Congratulations! You've won a $1000 gift card!")
print(result)
# Output: [{'label': 'SPAM', 'score': 0.9987}]

Using AutoModel and AutoTokenizer

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

# Load model and tokenizer
model_name = "niru-nny/SMS_Spam_Detection"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

# Prepare input
text = "Hey, are we still meeting for lunch tomorrow?"
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)

# Get prediction
with torch.no_grad():
    outputs = model(**inputs)
    predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
    predicted_class = torch.argmax(predictions, dim=-1).item()

# Map to label
labels = ["HAM", "SPAM"]
print(f"Prediction: {labels[predicted_class]} (confidence: {predictions[0][predicted_class]:.4f})")

Training Details

Dataset

  • Source: SMS Spam Collection Dataset
  • Total Messages: 5,574
  • Ham Messages: 4,827 (86.6%)
  • Spam Messages: 747 (13.4%)

Training Configuration

  • Base Model: bert-base-uncased
  • Max Sequence Length: 128 tokens
  • Batch Size: 16
  • Learning Rate: 2e-5
  • Epochs: 3
  • Optimizer: AdamW

Data Split

  • Training: 80%
  • Validation: 20%

Model Architecture

Input Text β†’ BERT Tokenizer β†’ BERT Encoder (12 layers) β†’ [CLS] Token β†’ Classification Head β†’ Output (HAM/SPAM)

Use Cases

βœ… Spam Filtering: Automatically filter spam messages in messaging applications
βœ… SMS Gateway Protection: Protect users from phishing and scam attempts
βœ… Content Moderation: Pre-screen messages in communication platforms
βœ… Fraud Detection: Identify suspicious messages in financial apps

Limitations

  • Model is trained specifically on English SMS messages
  • May not generalize well to other languages or message formats
  • Performance may vary on messages with heavy slang or abbreviations
  • Trained on historical data; new spam patterns may emerge

Ethical Considerations

⚠️ Privacy: Ensure compliance with data protection regulations when processing user messages
⚠️ False Positives: Important legitimate messages might be incorrectly flagged as spam
⚠️ Bias: Model may reflect biases present in training data

Citation

If you use this model, please cite:

@model{sms_spam_detection_bert_2026,
  title={SMS Spam Detection with BERT},
  author={niru-nny},
  year={2026},
  url={https://huggingface.co/niru-nny/SMS_Spam_Detection}
}

License

MIT License

Contact

For questions or feedback, please open an issue on the model repository.


Model Card: For detailed information about model development, evaluation, and responsible AI considerations, see the complete model card in the repository.

Downloads last month
3
Safetensors
Model size
0.1B params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Space using niru-nny/SMS_Spam_Detection 1