中国安全科学学报 ›› 2025, Vol. 35 ›› Issue (4): 211-218.doi: 10.16265/j.cnki.issn1003-3033.2025.04.0774

• 公共安全 • 上一篇    下一篇

基于RF-Apriori算法考虑填补缺失值的高速公路事故致因分析

薛乐1(), 于露 讲师1,**(), 金龙哲 教授2, 李博 教授1, 沈文进1   

  1. 1 大连交通大学 交通工程学院,辽宁 大连 116028
    2 北京科技大学 大安全科学研究院,北京 100083
  • 收稿日期:2024-11-14 修回日期:2025-01-08 出版日期:2025-04-28
  • 通信作者:
    **于 露(1990—),女,黑龙江大庆人,博士,讲师,主要从事交通安全系统工程等方面的研究。E-mail:
  • 作者简介:

    薛 乐 (1997—),男,山东济宁人,硕士研究生,主要研究方向为道路交通安全、事故致因分析等。E-mail:

  • 基金资助:
    辽宁省教育厅基本科研项目(LJKQZ20222462)

Causal analysis of highway accidents considering filling in missing values based on RF-Apriori algorithm

XUE Le1(), YU Lu1,**(), JIN Longzhe2, LI Bo1, SHEN Wenjin1   

  1. 1 School of Transportation Engineering,Dalian Jiaotong University,Dalian Liaoning 116028,China
    2 Research Institute of Macro-Safety Science, University of Science and Technology Beijing, Beijing 100083, China
  • Received:2024-11-14 Revised:2025-01-08 Published:2025-04-28

摘要:

为改善高速公路交通安全状况,以法国2018—2022年的26 320条高速公路交通事故数据作为研究对象,选择3种具有代表性的算法填补数据中的缺失值,包括随机森林(RF)算法、期望最大化(EM)算法以及K最近邻(KNN)算法。并基于填补前后变量方差的变化比较不同填补算法对数据稳定性的影响,并运用Apriori关联规则算法对完成填补的事故数据进行不同严重程度等级的高速公路事故致因分析。结果表明:经缺失值填补后,RF算法稳定性更优,相较于原始数据训练的模型准确率提高5.66%,召回率提高9.22%,F1分数提高9.91%。客车更易引发财产损失事故的发生;摩托车在限速较低的路段易引发受伤事故,在限速较高的路段易引发死亡事故,安全设备的使用情况对事故严重程度等级有较大关系。

关键词: 随机森林(RF), Apriori算法, 缺失值, 高速公路, 事故致因, 数据填补, 关联规则

Abstract:

In order to improve the safety condition of highways, 26 320 highway traffic accident records in France from 2018 to 2022 were selected as the research object. Three representative algorithms were selected to impute missing values in the data, including the RF algorithm, the expectation-maximization (EM) algorithm, and the K-nearest neighbors (KNN) algorithm. The impact of different imputation algorithms on data stability was compared based on the changes in variable variance before and after imputation. The Apriori association rule algorithm was then applied to analyze the causes of highway accidents with different severity levels using the completed dataset. The results indicate that after missing value imputation, the RF algorithm demonstrates superior stability. Compared to the model trained on the original data, the accuracy is improved by 5.66%, the recall rate is increased by 9.22%, and the F1 score is enhanced by 9.91%. It is found that passenger vehicles are more likely to cause property damage accidents; motorcycles are prone to cause injury accidents on roads with lower speed limits and fatal accidents on roads with higher speed limits. The use of safety equipment is significantly related to the severity level of accidents.

Key words: random forest(RF), Apriori algorithm, missing value, highway, accident cause, data filling, association rules

中图分类号: