基于聚类分析和SVM的二级公路交通事故严重度预测

doi:10.16265/j.cnki.issn1003-3033.2022.05.1263

中国安全科学学报 ›› 2022, Vol. 32 ›› Issue (5): 163-169.doi: 10.16265/j.cnki.issn1003-3033.2022.05.1263

基于聚类分析和SVM的二级公路交通事故严重度预测

杨文臣¹(), 周燕宁²^,³, 田毕江¹, 郭凤香³^,^**(), 胡澄宇¹

¹ 云南省交通规划设计研究院有限公司陆地交通气象灾害防治技术国家工程实验室, 云南昆明 650020
² 深圳市城市交通规划设计研究中心股份有限公司,广东深圳 518057
³ 昆明理工大学交通工程学院,云南昆明 650500

收稿日期:2021-12-11 修回日期:2022-03-12 出版日期:2022-05-28
通讯作者:
**郭凤香 (1978—),女,黑龙江海林人,博士,教授,硕士生导师,主要从事交通安全、驾驶行为等方面的研究。E-mail: gfxebox@qq.com。
作者简介:
杨文臣 (1985—),男,云南昌宁人,博士,高级工程师,硕士生导师,主要从事道路交通安全与环境、智能交通控制系统方面的研究。E-mail: tongjiywc@163.com。
田毕江, 高级工程师
胡澄宇, 教授级高级工程师
基金资助:
国家自然科学基金资助(71961012); 国家重点研发计划项目(2017YFC0803906); 云南省基础研究计划项目(2019FB072); 公司自立科技项目(ZL-2019-04)

Traffic accident severity prediction for secondary highways based on cluster analysis and SVM model

YANG Wenchen¹(), ZHOU Yanning²^,³, TIAN Bijiang¹, GUO Fengxiang³^,^**(), HU Chengyu¹

¹ National Engineering Laboratory For Surface Transportation Weather Impacts Prevention, Broadvision Engineering Consultants Co., Ltd., Kunming Yunnan 650200, China
² Shenzhen Urban Transportation Planning Center Co., Ltd., Shenzhen Guangdong 518057, China
³ Faculty of Transportation Engineering, Kunming University of Science and Technology, Kunming Yunnan 650500, China

Received:2021-12-11 Revised:2022-03-12 Published:2022-05-28

摘要/Abstract

摘要：

为明析输入特征对交通事故严重度机器学习预测模型的影响,基于1 808条山区二级公路事故数据,选择12个事故严重度的影响因素作为候选特征变量,采用K-means(KM)聚类算法离散化事故严重度的连续特征变量,采用随机森林(RF)算法识别事故严重度的重要特征变量,通过组合3种输入特征变量(候选特征、KM特征、RF特征)和支持向量机(SVM)算法,分别构建事故严重度的3种SVM预测模型(SVM^*、KM-SVM和RF-SVM),并分析3种SVM模型的预测性能及适用性。结果表明:通过离散连续变量和识别关键特征参数,可显著提高RF-SVM模型的预测准确率,重伤和死亡事故的预测准确率提高达40%;特征选择对SVM模型性能的影响程度要小于连续变量离散化;RF-SVM模型可获得比二元logistic回归模型更好的预测性能,但对不同输入特征的敏感性较高。

关键词: K-means(KM)聚类, 支持向量机(SVM), 事故严重度, 山区二级公路, 随机森林(RF), 机器学习

Abstract:

In order to identify influence of input features on machine-learning-based traffic accident severity (TAS) prediction model, 12 potential factors were firstly selected as input variables based on 1808 accidents of a secondary mountainous highway. Then, KM algorithm was used to discretize continuous feature variables of TAS, and RF algorithm was adopted to select key feature variables. Finally, by combining three kinds of input feature variables (potential features, KM features, and RF features) and SVM algorithm, three kinds of TAS prediction models were developed respectively (SVM^*, KM-SVM, and RF-SVM), and their prediction performance was systematically analyzed in terms of accuracy and applicability. The results show that severity prediction accuracy of RF-SVM model is significantly improved by discretizing continuous variables and identifying key feature parameters, with the accuracy for severe injuries or deaths being improved about 40%. Influence of feature selection on SVM model performance is less than that of discretization of continuous variables. And RF-SVM model, in spite of a better prediction performance than binary logistic regression model, has higher sensitivity to different input features.

Key words: K-means (KM) clustering, support vector machine (SVM), accident severity, secondary mountainous highway, random forest (RF), machine learning

杨文臣, 周燕宁, 田毕江, 郭凤香, 胡澄宇. 基于聚类分析和SVM的二级公路交通事故严重度预测[J]. 中国安全科学学报, 2022, 32(5): 163-169.

YANG Wenchen, ZHOU Yanning, TIAN Bijiang, GUO Fengxiang, HU Chengyu. Traffic accident severity prediction for secondary highways based on cluster analysis and SVM model[J]. China Safety Science Journal, 2022, 32(5): 163-169.

图/表 10

图1

图2

表1

潜在输入特征参数定义

特征参数名称		值域		离散编码
肇事者性别 $x 1$		—		0=男,1=女
肇事者年龄 $x 2$ /岁		18~71		1=(<45),2=[45,60),3= (≥60)
事故发生季节 $x 3$		—		1=春季,2=夏季,3=秋季,4=冬季
事故发生时间 $x 4$		0 ~24		1=[0,6),2=[6,12),3=[12,18)4=[18,24)
R即x₅/m		>100		1=(<800),2=[800,1 600),3=(≥1 600)
特征参数名称	值域		离散编码
L $x 6$ /m	170~1800		0=(<1 100),1=(≥1 100)
D即 $x 7$ /(°)	0~6		1=(<2),2=[2,4),3= (≥4)
事故类型 $x 8$	—		1=追尾,2=碰撞固定物,3=侧翻,4=侧面碰撞,5=碰撞行人,6=正面碰撞,7=其他
肇事者车型 $x 9$	—		1=小型车,2=中型车,3=大型车,4=非机动车
涉事者车型 $x 10$	—		1=小型车,2=中型车,3=大型车,4=非机动车,5=其他
C_Q即 $x 11$	0.1-3.0		1=[0.00-0.65],2=(0.65-1.34],3=(1.34-2.00],4=(>2.00)
天气条件 $x 12$	—		0=晴天,1=非晴天

表1

图3

图4

表2

图5

图6

图7

图8

参考文献 16

[1]	王建军, 曹旭东, 杨云峰. 基于CRSG模型的山区公路风险隔离研究[J]. 中国公路学报, 2018, 31(9):119-128.
	WANG Jianjun, CAO Xudong, YANG Yunfeng. Risk isolation analysis of mountain highways areas based on CRSG model[J]. China Journal of Highway and Transport, 2018, 31(9):119-128.
[2]	MANNERING F L, BHAT C R. Analytic methods in accident research: methodological frontier and future directions[J]. Analytic Methods in Accident Research, 2014, 1: 1-22. doi: 10.1016/j.amar.2013.09.001
[3]	MAMMERING F L, BHAT C, SHANKAR V, et al. Big data, traditional data and the tradeoffs between prediction and causality in highway-safety analysis[J]. Analytic Methods in Accident Research, 2020, 25: DOI: 10.1016/j.amar.2020.100113. doi: 10.1016/j.amar.2020.100113
[4]	IRANITALAB A, KHATTAK A. Comparison of four statistical and machine learning methods for crash severity prediction[J]. Accident Analysis and Prevention, 2017, 108:27-36. doi: 10.1016/j.aap.2017.08.008
[5]	戢晓峰, 李德林, 杨文臣, 等. 山区二级公路交通事故致因的时间演化机制[J]. 中国安全科学学报, 2019, 29(4):31-36. doi: 10.16265/j.cnki.issn1003-3033.2019.04.006
	JI Xiaofeng, LI Delin, YANG Wenchen, et al. Temporal evolution analysis on causes of traffic accidents occurring on secondary highways in mountain areas[J]. China Safety Science Journal, 2019, 29(4):31-36. doi: 10.16265/j.cnki.issn1003-3033.2019.04.006
[6]	FOUNTAS G, ANASTASOPOULOS P C. A random thresholds random parameters hierarchical ordered probit analysis of highway accident injury-severities[J]. Analytic Methods in Accident Research, 2017, 15: 1-16. doi: 10.1016/j.amar.2017.03.002
[7]	杨文臣, 谢碧珊, 房锐, 等. 山区双车道公路机动车碰撞事故严重度致因比较分析与预测[J]. 交通运输系统工程与信息, 2021, 21(1):190-195.
	YANG Wenchen, XIE Bishan, FANG Rui, et al. Comparative analysis and prediction of motor vehicle crash severity on mountainous two-lane highways[J]. Journal of Transportation Systems Engineering and Information Technology, 2021, 21(1):190-195.
[8]	YU Rongjie, ABDEL-ATY M. Analyzing crash injury severity for a mountainous freeway incorporating real-time traffic and weather data[J]. Safety Science, 2014, 63(4):50-56. doi: 10.1016/j.ssci.2013.10.012
[9]	马壮林, 张祎祎, 杨杨, 等. 公路隧道交通事故严重程度预测模型研究[J]. 中国安全科学学报, 2015, 25(5):75-79.
	MA Zhuanglin, ZHANG Yiyi, YANG Yang, et al. Research on models for predicting severity of traffic accident in highway tunnel[J]. China Safety Science Journal, 2015, 25(5):75-79.
[10]	董傲然, 王长帅, 秦丹, 等. 机动车-行人事故中行人伤害严重程度分析[J]. 中国安全科学学报, 2020, 30(11): 141-147. doi: 10.16265/j.cnki.issn 1003-3033.2020.11.021
	DONG Aoran, WANG Changshuai, QIN Dan, et al. Analysis on injury severity of pedestrian in motor vehicle-pedestrian accidents[J]. China Safety Science Journal, 2020, 30(11): 141-147. doi: 10.16265/j.cnki.issn 1003-3033.2020.11.021
[11]	ZHU Li, YU Fei Richard, WANG Yige, et al. Big data analytics in intelligent transportation systems: a survey[J]. IEEE Transactions on Intelligent Transportation Systems, 2018, 20(1): 383-398. doi: 10.1109/TITS.2018.2815678
[12]	YU Rongjie, ABEDEL-ATY M. Utilizing support vector machine in real-time crash risk evaluation[J]. Accident Analysis and Prevention, 2013, 51:252-259. doi: 10.1016/j.aap.2012.11.027
[13]	游锦明, 王俊骅, 唐棠, 等. 基于支持向量机的高速公路实时事故风险研判[J]. 同济大学学报:自然科学版, 2017, 45(3):355-361.
	YOU Jinming, WANG Junye, TANG Tang, et al. Support vector machines approach for predicting real-time rear-end crash risk on freeways[J]. Journal of Tongji University: Natural Science, 2017, 45(3):355-361.
[14]	SUN Ming, SUN Xiaoduan, SHAN Donghui. Pedestrian crash analysis with latent class clustering method[J]. Accident Analysis and Prevention, 2019, 124:50-57. doi: 10.1016/j.aap.2018.12.016
[15]	SILVA P B, ANDRADE M, FERREIR A S. Machine learning applied to road safety modeling: a systematic literature review[J]. Journal of Traffic and Transportation Engineering: English edition, 2020, 7(6):775-790.
[16]	周志华. 机器学习[M]. 北京: 清华大学出版社, 2016: 121-204.
	ZHOU Zhihua. Machine learning[M]. Beijing: Tsinghua University Press, 2016:121-204.

模型	C	γ	预测准确率
模型	C	γ	非严重事故/%	严重事故/%	整体/ %
SVM^*	9	0.07	91.15	18.38	64.09
KM-SVM	1.1	0.09	83.63	58.82	81.77
RF-SVM	32	0.06	88.93	67.65	82.60

基于聚类分析和SVM的二级公路交通事故严重度预测

Traffic accident severity prediction for secondary highways based on cluster analysis and SVM model

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 10

参考文献 16

相关文章 15

编辑推荐

Metrics

本文评价

[1]	王团辉, 王超, 吴顺川, 王琦玮, 徐健珲. 基于MISSA-SVM模型的边坡稳定性预测及应用[J]. 中国安全科学学报, 2024, 34(4): 135-144.
[2]	王军武, 何娟娟, 宋盈辉, 刘一鹏, 陈兆, 郭婧怡. 基于RF-SFLA-SVM的装配式建筑高空作业工人不安全行为预警[J]. 中国安全科学学报, 2024, 34(3): 1-8.
[3]	祁云, 薛凯隆, 汪伟, 崔欣超, 王宏祥, 齐庆杰. 矿井煤与瓦斯突出事故应急救援能力评估模型[J]. 中国安全科学学报, 2024, 34(2): 225-230.
[4]	牛甜辉, 耿佃桥, 苑轶, 赵亮, 董辉, 王柏. 基于火灾痕迹的起火点判定研究现状及展望[J]. 中国安全科学学报, 2024, 34(1): 238-246.
[5]	吴赛赛, 张曾瑞, 金韬, 张鑫, 郭进平. 深部矿山锚杆腐蚀失效风险评估研究[J]. 中国安全科学学报, 2023, 33(S1): 180-184.
[6]	汪伟, 崔欣超, 祁云, 梁然, 贾宝山, 薛凯隆. 基于SSA-RF的采空区煤自燃温度回归分析模型[J]. 中国安全科学学报, 2023, 33(9): 136-141.
[7]	温惠英, 黄坤火, 赵胜. 基于机器学习的高速公路大型货车追尾风险预测[J]. 中国安全科学学报, 2023, 33(9): 173-180.
[8]	靳春玲, 姬照泰, 贡力, 安祥, 周一. 基于WOA-SVM的引水隧洞岩爆烈度评估模型[J]. 中国安全科学学报, 2023, 33(9): 41-48.
[9]	吴方浩, 陈伟, 孙惠中, 牛力, 付建华. 建筑工程安全文明施工费预测精度研究[J]. 中国安全科学学报, 2023, 33(8): 51-58.
[10]	段在鹏, 李帆, 郭进, 李炯. 城市易涝区房屋结构安全集成预警模型[J]. 中国安全科学学报, 2023, 33(7): 173-180.
[11]	赵伟, 李书全. 基于机器学习的建筑施工人员安全能力预测模型[J]. 中国安全科学学报, 2023, 33(7): 51-57.
[12]	张西良, 焦灏恺, 李二宝. 基于迁移学习算法的深部爆破振动速度预测[J]. 中国安全科学学报, 2023, 33(6): 64-72.
[13]	段在鹏, 李炯, 李帆, 刘碧强. 农村改造房屋结构安全预警模型[J]. 中国安全科学学报, 2023, 33(4): 100-106.
[14]	何淑波, 项薇, 石钟淼. 基于机器学习的电动汽车电池系统的风险预警[J]. 中国安全科学学报, 2023, 33(2): 159-165.
[15]	李丽, 曹玉宽, 陈瑶, 赵营, 齐金浩. 基于多生理信号的飞行警戒疲劳检测[J]. 中国安全科学学报, 2023, 33(2): 225-232.