China Safety Science Journal ›› 2022, Vol. 32 ›› Issue (5): 163-169.doi: 10.16265/j.cnki.issn1003-3033.2022.05.1263

• Public safety • Previous Articles     Next Articles

Traffic accident severity prediction for secondary highways based on cluster analysis and SVM model

YANG Wenchen1(), ZHOU Yanning2,3, TIAN Bijiang1, GUO Fengxiang3,**(), HU Chengyu1   

  1. 1 National Engineering Laboratory For Surface Transportation Weather Impacts Prevention, Broadvision Engineering Consultants Co., Ltd., Kunming Yunnan 650200, China
    2 Shenzhen Urban Transportation Planning Center Co., Ltd., Shenzhen Guangdong 518057, China
    3 Faculty of Transportation Engineering, Kunming University of Science and Technology, Kunming Yunnan 650500, China
  • Received:2021-12-11 Revised:2022-03-12 Online:2022-05-28 Published:2022-11-28
  • Contact: GUO Fengxiang

Abstract:

In order to identify influence of input features on machine-learning-based traffic accident severity (TAS) prediction model, 12 potential factors were firstly selected as input variables based on 1808 accidents of a secondary mountainous highway. Then, KM algorithm was used to discretize continuous feature variables of TAS, and RF algorithm was adopted to select key feature variables. Finally, by combining three kinds of input feature variables (potential features, KM features, and RF features) and SVM algorithm, three kinds of TAS prediction models were developed respectively (SVM*, KM-SVM, and RF-SVM), and their prediction performance was systematically analyzed in terms of accuracy and applicability. The results show that severity prediction accuracy of RF-SVM model is significantly improved by discretizing continuous variables and identifying key feature parameters, with the accuracy for severe injuries or deaths being improved about 40%. Influence of feature selection on SVM model performance is less than that of discretization of continuous variables. And RF-SVM model, in spite of a better prediction performance than binary logistic regression model, has higher sensitivity to different input features.

Key words: K-means (KM) clustering, support vector machine (SVM), accident severity, secondary mountainous highway, random forest (RF), machine learning