A Machine Learning Approach to Identify High-Risk Road Segments and Accident Severity Patterns Based on Categorical Data
Tarih
Dergi Başlığı
Dergi ISSN
Cilt Başlığı
Yayıncı
Erişim Hakkı
Özet
Traffic accidents remain a major public safety concern, particularly in regions where rapid motorization and limited infrastructure increase crash risk. This study proposes a machine learning-based framework to classify traffic accident severity and identify high-risk road segments using multidimensional crash data from & Scedil;& imath;rnak Province, Turkey. The dataset, obtained from the General Directorate of Security (EGM), contains 29 variables describing traffic, geometric, and operational roadway characteristics for crashes reported between 2018 and 2023. Due to the severe imbalance between injury and fatal crashes, the Synthetic Minority Oversampling Technique (SMOTE) was applied to enhance model sensitivity to the minority class. Five classifiers-Logistic Regression (LR), Support Vector Machines (SVM), Multilayer Perceptron (MLP), Random Forest (RF), and Extreme Gradient Boosting (XGBoost)-were trained and evaluated using accuracy, F1-score, ROC-AUC, and alarm metrics. Results from the original dataset showed that several models struggled to detect fatal crashes, while LR demonstrated moderate sensitivity. After SMOTE, performance improved across all models. XGBoost achieved the highest F1-score (0.61) with the lowest False Alarm rate (0.01), followed by RF and MLP, whereas SVM and LR yielded comparatively lower accuracy. Computation time analysis indicated that LR and SVM had the fastest runtimes, while MLP and XGBoost required longer training times. Overall, findings highlight the effectiveness of ensemble models-particularly XGBoost-in capturing critical crash patterns and supporting risk-based decision-making. Future work should incorporate time-series analysis and GIS-based spatial modeling to further enhance predictive capability and inform geographically targeted safety interventions.









