A Machine Learning Approach to Identify High-Risk Road Segments and Accident Severity Patterns Based on Categorical Data

Yumak, Ahmet; Hengirmen Tercan, Safak; Colak, Umut Can; Ozcanan, Sedat

doi:10.3390/app152312824

A Machine Learning Approach to Identify High-Risk Road Segments and Accident Severity Patterns Based on Categorical Data

Tarih

2025

Yazarlar

Yumak, Ahmet

Hengirmen Tercan, Safak

Colak, Umut Can

Ozcanan, Sedat

Yayıncı

Mdpi

Erişim Hakkı

info:eu-repo/semantics/openAccess

Özet

Traffic accidents remain a major public safety concern, particularly in regions where rapid motorization and limited infrastructure increase crash risk. This study proposes a machine learning-based framework to classify traffic accident severity and identify high-risk road segments using multidimensional crash data from & Scedil;& imath;rnak Province, Turkey. The dataset, obtained from the General Directorate of Security (EGM), contains 29 variables describing traffic, geometric, and operational roadway characteristics for crashes reported between 2018 and 2023. Due to the severe imbalance between injury and fatal crashes, the Synthetic Minority Oversampling Technique (SMOTE) was applied to enhance model sensitivity to the minority class. Five classifiers-Logistic Regression (LR), Support Vector Machines (SVM), Multilayer Perceptron (MLP), Random Forest (RF), and Extreme Gradient Boosting (XGBoost)-were trained and evaluated using accuracy, F1-score, ROC-AUC, and alarm metrics. Results from the original dataset showed that several models struggled to detect fatal crashes, while LR demonstrated moderate sensitivity. After SMOTE, performance improved across all models. XGBoost achieved the highest F1-score (0.61) with the lowest False Alarm rate (0.01), followed by RF and MLP, whereas SVM and LR yielded comparatively lower accuracy. Computation time analysis indicated that LR and SVM had the fastest runtimes, while MLP and XGBoost required longer training times. Overall, findings highlight the effectiveness of ensemble models-particularly XGBoost-in capturing critical crash patterns and supporting risk-based decision-making. Future work should incorporate time-series analysis and GIS-based spatial modeling to further enhance predictive capability and inform geographically targeted safety interventions.

Anahtar Kelimeler

traffic accident, machine learning, classification, accident severity prediction, XGBoost, data mining

Kaynak

Applied Sciences-Basel

WoS Q Değeri

N/A

Scopus Q Değeri

Q1

Cilt

15

Sayı

23

Bağlantı

https://doi.org/10.3390/app152312824
https://hdl.handle.net/11503/3439

Koleksiyon

WoS İndeksli Yayınlar Koleksiyonu
Scopus İndeksli Yayınlar Koleksiyonu

Detaylı Öğe Kaydı

A Machine Learning Approach to Identify High-Risk Road Segments and Accident Severity Patterns Based on Categorical Data

Tarih

Yazarlar

Dergi Başlığı

Dergi ISSN

Cilt Başlığı

Yayıncı

Erişim Hakkı

Özet

Açıklama

Anahtar Kelimeler

Kaynak

WoS Q Değeri

Scopus Q Değeri

Cilt

Sayı

Künye

Bağlantı

Koleksiyon

Onay

İnceleme

Ekleyen

Referans Veren