杨文臣1(), 周燕宁2,3, 田毕江1, 郭凤香3,**(), 胡澄宇1   

  1. 1 云南省交通规划设计研究院有限公司 陆地交通气象灾害防治技术国家工程实验室, 云南 昆明 650020
    2 深圳市城市交通规划设计研究中心股份有限公司,广东 深圳 518057
    3 昆明理工大学 交通工程学院,云南 昆明 650500
  • 收稿日期:2021-12-11 修回日期:2022-03-12 出版日期:2022-05-28
    **郭凤香 (1978—),女,黑龙江海林人,博士,教授,硕士生导师,主要从事交通安全、驾驶行为等方面的研究。E-mail:
    杨文臣 (1985—),男,云南昌宁人,博士,高级工程师,硕士生导师,主要从事道路交通安全与环境、智能交通控制系统方面的研究。E-mail:

    田毕江, 高级工程师

    胡澄宇, 教授级高级工程师

    国家自然科学基金资助(71961012); 国家重点研发计划项目(2017YFC0803906); 云南省基础研究计划项目(2019FB072); 公司自立科技项目(ZL-2019-04)

Traffic accident severity prediction for secondary highways based on cluster analysis and SVM model

YANG Wenchen1(), ZHOU Yanning2,3, TIAN Bijiang1, GUO Fengxiang3,**(), HU Chengyu1   

  1. 1 National Engineering Laboratory For Surface Transportation Weather Impacts Prevention, Broadvision Engineering Consultants Co., Ltd., Kunming Yunnan 650200, China
    2 Shenzhen Urban Transportation Planning Center Co., Ltd., Shenzhen Guangdong 518057, China
    3 Faculty of Transportation Engineering, Kunming University of Science and Technology, Kunming Yunnan 650500, China
  • Received:2021-12-11 Revised:2022-03-12 Published:2022-05-28


为明析输入特征对交通事故严重度机器学习预测模型的影响,基于1 808条山区二级公路事故数据,选择12个事故严重度的影响因素作为候选特征变量,采用K-means(KM)聚类算法离散化事故严重度的连续特征变量,采用随机森林(RF)算法识别事故严重度的重要特征变量,通过组合3种输入特征变量(候选特征、KM特征、RF特征)和支持向量机(SVM)算法,分别构建事故严重度的3种SVM预测模型(SVM*、KM-SVM和RF-SVM),并分析3种SVM模型的预测性能及适用性。结果表明:通过离散连续变量和识别关键特征参数,可显著提高RF-SVM模型的预测准确率,重伤和死亡事故的预测准确率提高达40%;特征选择对SVM模型性能的影响程度要小于连续变量离散化;RF-SVM模型可获得比二元logistic回归模型更好的预测性能,但对不同输入特征的敏感性较高。

关键词: K-means(KM)聚类, 支持向量机(SVM), 事故严重度, 山区二级公路, 随机森林(RF), 机器学习


In order to identify influence of input features on machine-learning-based traffic accident severity (TAS) prediction model, 12 potential factors were firstly selected as input variables based on 1808 accidents of a secondary mountainous highway. Then, KM algorithm was used to discretize continuous feature variables of TAS, and RF algorithm was adopted to select key feature variables. Finally, by combining three kinds of input feature variables (potential features, KM features, and RF features) and SVM algorithm, three kinds of TAS prediction models were developed respectively (SVM*, KM-SVM, and RF-SVM), and their prediction performance was systematically analyzed in terms of accuracy and applicability. The results show that severity prediction accuracy of RF-SVM model is significantly improved by discretizing continuous variables and identifying key feature parameters, with the accuracy for severe injuries or deaths being improved about 40%. Influence of feature selection on SVM model performance is less than that of discretization of continuous variables. And RF-SVM model, in spite of a better prediction performance than binary logistic regression model, has higher sensitivity to different input features.

Key words: K-means (KM) clustering, support vector machine (SVM), accident severity, secondary mountainous highway, random forest (RF), machine learning