🚗 Road Accident Severity Prediction Model

Model Description

This is a Random Forest Classifier trained to predict the severity of road accidents based on environmental, infrastructure, and spatio-temporal features. The model classifies accidents into three categories:

Slight (Label 2)
Serious (Label 1)
Fatal (Label 0)

The model was specifically optimized using SMOTE (Synthetic Minority Over-sampling Technique) to address the significant class imbalance in road accident data, where fatal incidents are rare but high-impact.

📊 Performance Summary

Overall Accuracy: 80.03%
Recall for Fatal/Serious: Successfully improved via SMOTE to prioritize life-safety over simple accuracy.
Model Size: 1.97 GB (trained on 630,000+ resampled observations).

🛠️ Intended Use

This model is designed for traffic safety researchers and city planners to:

Identify "High-Risk Profiles" for specific road segments.
Predict the likely outcome of an accident given specific weather, light, and speed conditions.
Serve as a backend for an intelligent traffic warning system.

📂 How to Load and Use

The model is saved as a serialized joblib pickle file. Because of its size, ensure you have sufficient RAM (at least 8GB) when loading.

import joblib
import pandas as pd

# 1. Download the model file (road_accident_model.pkl) from this repo
model = joblib.load('road_accident_model.pkl')

# 2. Prepare your input data
# The model expects 62 features including One-Hot Encoded variables
# (e.g., Speed_limit, Hour, Weather_Conditions_Raining, etc.)
# results = model.predict(your_input_dataframe)
# probabilities = model.predict_proba(your_input_dataframe)

🏗️ Training Data
Source: Historical Road Accident Data (307,973 initial records).
Features: 62 engineered features including:
Spatio-Temporal: Latitude, Longitude, Hour of Day, Day of Week.
Infrastructure: Speed Limit, Road Type, Junction Control.
Environmental: Weather Conditions, Light Conditions, Road Surface Conditions.

⚠️ Limitations & Bias
Behavioral Data: This model does not include driver demographics (age, experience) or behavioral data (alcohol, distraction), which are major contributors to accidents.
Geography: The model is trained on specific regional data; its accuracy may decrease if applied to countries with significantly different road infrastructures or driving cultures.

👤 Author
Hayam Wahdan
GitHub Profile: "https://github.com/hayamwahdan"
LinkedIn: "https://www.linkedin.com/in/hayamwahdan"

Downloads last month: -