A Machine Learning Approach to Identify High-Risk Road Segments and Accident Severity Patterns Based on Categorical Data

dc.contributor.authorYumak, Ahmet
dc.contributor.authorHengirmen Tercan, Safak
dc.contributor.authorColak, Umut Can
dc.contributor.authorOzcanan, Sedat
dc.date.accessioned2026-01-22T19:51:40Z
dc.date.issued2025
dc.departmentŞırnak Üniversitesi
dc.description.abstractTraffic accidents remain a major public safety concern, particularly in regions where rapid motorization and limited infrastructure increase crash risk. This study proposes a machine learning-based framework to classify traffic accident severity and identify high-risk road segments using multidimensional crash data from & Scedil;& imath;rnak Province, Turkey. The dataset, obtained from the General Directorate of Security (EGM), contains 29 variables describing traffic, geometric, and operational roadway characteristics for crashes reported between 2018 and 2023. Due to the severe imbalance between injury and fatal crashes, the Synthetic Minority Oversampling Technique (SMOTE) was applied to enhance model sensitivity to the minority class. Five classifiers-Logistic Regression (LR), Support Vector Machines (SVM), Multilayer Perceptron (MLP), Random Forest (RF), and Extreme Gradient Boosting (XGBoost)-were trained and evaluated using accuracy, F1-score, ROC-AUC, and alarm metrics. Results from the original dataset showed that several models struggled to detect fatal crashes, while LR demonstrated moderate sensitivity. After SMOTE, performance improved across all models. XGBoost achieved the highest F1-score (0.61) with the lowest False Alarm rate (0.01), followed by RF and MLP, whereas SVM and LR yielded comparatively lower accuracy. Computation time analysis indicated that LR and SVM had the fastest runtimes, while MLP and XGBoost required longer training times. Overall, findings highlight the effectiveness of ensemble models-particularly XGBoost-in capturing critical crash patterns and supporting risk-based decision-making. Future work should incorporate time-series analysis and GIS-based spatial modeling to further enhance predictive capability and inform geographically targeted safety interventions.
dc.description.sponsorshipScedil;imath;rnak University [2024.FNAP.06.06.01.]
dc.description.sponsorshipThis study was supported by & Scedil;& imath;rnak University Scientific Research Projects Coordination Unit. Scientific Research Project No. 2024.FNAP.06.06.01.
dc.identifier.doi10.3390/app152312824
dc.identifier.issn2076-3417
dc.identifier.issue23
dc.identifier.orcid0000-0002-1729-6421
dc.identifier.scopus2-s2.0-105024688292
dc.identifier.scopusqualityQ1
dc.identifier.urihttps://doi.org/10.3390/app152312824
dc.identifier.urihttps://hdl.handle.net/11503/3439
dc.identifier.volume15
dc.identifier.wosWOS:001633972300001
dc.identifier.wosqualityN/A
dc.indekslendigikaynakWeb of Science
dc.indekslendigikaynakScopus
dc.language.isoen
dc.publisherMdpi
dc.relation.ispartofApplied Sciences-Basel
dc.relation.publicationcategoryMakale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı
dc.rightsinfo:eu-repo/semantics/openAccess
dc.snmzKA_WOS_20260122
dc.subjecttraffic accident
dc.subjectmachine learning
dc.subjectclassification
dc.subjectaccident severity prediction
dc.subjectXGBoost
dc.subjectdata mining
dc.titleA Machine Learning Approach to Identify High-Risk Road Segments and Accident Severity Patterns Based on Categorical Data
dc.typeArticle

Dosyalar