ASL Recognition Model - Improved ✨

Improved ASL Recognition using 1D CNN + Bidirectional LSTM with Attention on MediaPipe landmarks.

Performance

Validation Accuracy: 64.61%
Improvement: +438% from baseline (12% → 64.61%)
Parameters: 3,258,574
Best Epoch: 27

Model Architecture

This improved model uses a hybrid architecture:

1D Convolutional Layers: Extract spatial features from hand landmarks
- Conv1D (126 → 128 → 256) with BatchNorm and Dropout
Bidirectional LSTM: Model temporal dependencies across frames
- 2 layers, 256 hidden units, bidirectional
Attention Mechanism: Focus on important frames in the sequence
Classification Head: 512 → 256 → 77 classes with BatchNorm + Dropout

Key Improvements

Phase 1 - Data

✅ Stratified train/val split (80/20) preserving class distribution
✅ Data augmentation: Gaussian noise on landmarks
✅ Class-weighted loss to handle any imbalance

Phase 2 - Model

✅ 1D CNN for better spatial feature extraction
✅ Bidirectional LSTM for temporal modeling
✅ Attention mechanism for frame importance
✅ BatchNorm for training stability

Phase 3 - Training

✅ AdamW optimizer with weight decay (1e-4)
✅ OneCycleLR scheduler for better convergence
✅ Gradient clipping (max_norm=1.0)
✅ Early stopping (patience=15)
✅ Class-weighted CrossEntropyLoss

Phase 4 - Validation

✅ Proper stratified validation split
✅ No augmentation during validation
✅ Consistent preprocessing pipeline

Vocabulary

77 classes: 26 letters (A-Z) + 51 common words
Words: hello, goodbye, please, thank_you, yes, no, help, sorry, good, bad, friend, family, love, eat, drink, water, food, home, work, school, teacher, student, book, read, write, learn, understand, know, think, feel, want, need, have, go, come, see, hear, speak, sign, time, today, tomorrow, yesterday, morning, afternoon, evening, night, happy, sad, angry, tired

Training Details

Batch Size: 32
Epochs: 100 (early stopped at 42)
Learning Rate: 1e-3 (OneCycleLR)
Weight Decay: 1e-4
Dropout: 0.3
Train Samples: 6,160
Val Samples: 1,540

Usage

import torch
import numpy as np
from train_asl_improved import ImprovedASLModel

# Load model
model = ImprovedASLModel(num_classes=77)
checkpoint = torch.load('best_model.pth', map_location='cpu')
model.load_state_dict(checkpoint['model_state_dict'])
model.eval()

# Load label encoder
classes = np.load('label_encoder_classes.npy', allow_pickle=True)

# Inference
landmarks = torch.randn(1, 30, 2, 21, 3)  # (batch, seq_len, hands, landmarks, coords)
mask = torch.ones(1, 30)  # Valid frames mask

with torch.no_grad():
    logits = model(landmarks, mask)
    pred = torch.argmax(logits, dim=-1)
    print(f"Predicted: {classes[pred.item()]}")

Input Format

Shape: (batch, 30, 2, 21, 3)
- 30 frames (sequence length)
- 2 hands (left + right)
- 21 landmarks per hand (MediaPipe format)
- 3 coordinates (x, y, z)
Normalization: Landmarks normalized to [-1, 1] range
Padding: Zero-padded sequences with mask support

Training Results

Metric	Baseline	Improved	Change
Val Accuracy	12.01%	64.61%	+52.6%
Train Accuracy	~20%	72.58%	+52.6%
Parameters	323K	3.26M	10x

Validation Accuracy Timeline

Epoch 1: 7.01%
Epoch 5: 25.13%
Epoch 10: 48.77%
Epoch 15: 57.79%
Epoch 20: 61.17%
Epoch 27: 64.61% ← Best
Epoch 42: 62.40% (early stop)

Next Steps to Reach 80%+

More Data: Current 100 samples/class is minimal; collect 300-500 per class
Better Augmentation: Time warping, speed perturbation, mixup
Ensemble: Combine multiple models or use test-time augmentation
Architecture: Try Transformer encoders instead of LSTM
Signer Independence: Ensure train/val split by signer ID (not done in current dataset)

Files

best_model.pth: Model checkpoint (37MB)
label_encoder_classes.npy: Class labels mapping
model_config.json: Configuration and metadata
training_history.png: Loss/accuracy plots
training_history.npy: Training metrics

Citation

If you use this model, please cite:

@misc{asl-recognition-improved,
  title={Improved ASL Recognition with CNN+LSTM},
  author={namratha2412},
  year={2025},
  howpublished={\url{https://huggingface.co/namratha2412/asl-recognition}}
}

License: MIT
Dataset: 7,700 sequences (100 per class)
Framework: PyTorch 2.0+

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support