π Road Accident Severity Prediction Model
Model Description
This is a Random Forest Classifier trained to predict the severity of road accidents based on environmental, infrastructure, and spatio-temporal features. The model classifies accidents into three categories:
- Slight (Label 2)
- Serious (Label 1)
- Fatal (Label 0)
The model was specifically optimized using SMOTE (Synthetic Minority Over-sampling Technique) to address the significant class imbalance in road accident data, where fatal incidents are rare but high-impact.
π Performance Summary
- Overall Accuracy: 80.03%
- Recall for Fatal/Serious: Successfully improved via SMOTE to prioritize life-safety over simple accuracy.
- Model Size: 1.97 GB (trained on 630,000+ resampled observations).
π οΈ Intended Use
This model is designed for traffic safety researchers and city planners to:
- Identify "High-Risk Profiles" for specific road segments.
- Predict the likely outcome of an accident given specific weather, light, and speed conditions.
- Serve as a backend for an intelligent traffic warning system.
π How to Load and Use
The model is saved as a serialized joblib pickle file. Because of its size, ensure you have sufficient RAM (at least 8GB) when loading.
import joblib
import pandas as pd
# 1. Download the model file (road_accident_model.pkl) from this repo
model = joblib.load('road_accident_model.pkl')
# 2. Prepare your input data
# The model expects 62 features including One-Hot Encoded variables
# (e.g., Speed_limit, Hour, Weather_Conditions_Raining, etc.)
# results = model.predict(your_input_dataframe)
# probabilities = model.predict_proba(your_input_dataframe)
ποΈ Training Data
Source: Historical Road Accident Data (307,973 initial records).
Features: 62 engineered features including:
Spatio-Temporal: Latitude, Longitude, Hour of Day, Day of Week.
Infrastructure: Speed Limit, Road Type, Junction Control.
Environmental: Weather Conditions, Light Conditions, Road Surface Conditions.
β οΈ Limitations & Bias
Behavioral Data: This model does not include driver demographics (age, experience) or behavioral data (alcohol, distraction), which are major contributors to accidents.
Geography: The model is trained on specific regional data; its accuracy may decrease if applied to countries with significantly different road infrastructures or driving cultures.
π€ Author
Hayam Wahdan
GitHub Profile: "https://github.com/hayamwahdan"
LinkedIn: "https://www.linkedin.com/in/hayamwahdan"
- Downloads last month
- -