中国安全科学学报 ›› 2023, Vol. 33 ›› Issue (2): 23-30.doi: 10.16265/j.cnki.issn1003-3033.2023.02.0034

• 安全社会科学与安全管理 • 上一篇    下一篇

基于机器学习的公交驾驶员事故风险识别及影响因素研究

朱彤1(), 秦丹1, 魏雯1, 任杰1, 冯移冬2,**()   

  1. 1 长安大学 运输工程学院,陕西 西安 710064
    2 交通运输部公路科学研究所,北京 100088
  • 收稿日期:2022-10-17 修回日期:2022-12-21 出版日期:2023-02-28
  • 通讯作者:
    **冯移冬(1990—),男,安徽六安人,硕士,助理研究员,主要从事交通安全方面的研究。E-mail:
  • 作者简介:

    朱 彤 (1977—),男,浙江诸暨人,博士,副教授,从事交通规划、交通安全及智能交通研究。E-mail:

  • 基金资助:
    国家重点研发计划(2019YFE0108000); 陕西省交通运输厅科研项目(21-34R)

Research on accident risk identification and influencing factors of bus drivers based on machine learning

ZHU Tong1(), QIN Dan1, WEI Wen1, REN Jie1, FENG Yidong2,**()   

  1. 1 College of Transportation Engineering, Chang'an University, Xi'an Shaanxi 710064, China
    2 Research Institute of Highway Ministry of Transport, Beijing 100088, China
  • Received:2022-10-17 Revised:2022-12-21 Published:2023-02-28

摘要:

为从公交驾驶员群体中识别出易发生事故的风险公交驾驶员,结合某市公交公司营运安全管理系统数据库、百度应用程序接口(API)及网络爬取技术,并应用K近邻算法补充缺失值,获取42条线路及1 893名驾驶员的数据;基于驾驶员、车辆、线路特征、违规行为、事故、管理等基本特征变量构造派生变量;采用包括递归特征消除、有惩罚项的逻辑回归、随机森林的集成方法选择特征;采用极致梯度提升(XGBoost)等6种机器方法分别建立分类模型,并采用贝叶斯方法优化超参数。结果表明:在构建的6个分类模型中,XGBoost方法构建的模型其受试者工作特征(ROC)曲线下的面积(AUC)评估结果最佳;运用贝叶斯方法优化模型,可以在一定程度上提升ROC的AUC指标;对于风险公交驾驶员预测准确率达到98.66%,运营单位还可以根据自身情况权衡虚报率与命中率代价。此外,车辆服役时间、违规次数等特征对于事故风险具有明显的非线性影响。

关键词: 风险公交驾驶员, 机器学习, 事故风险, 极致梯度提升(XGBoost), SHapley加性解释(SHAP)值

Abstract:

In order to identify the bus drivers who are about to cause accidents, the data set was obtained by combining the Bus Safety Management System database, the Baidu application programming interface(API) and web crawling technology. K-Nearest Neighbor algorithm was used to supplement the missing values and data from 1893 drivers in 42 lines was obtained. The basic characteristic variables included driver, vehicle, route characteristics, violations, accidents, management, and further construct derived characteristics on this basis. An integrated method, including recursive feature elimination, logistic regression with penalty terms, random forest and others, was designed and used for feature selection. The model was built using 6 machine methods like XGBoost and optimized for the hyper-parameters using Bayesian methods. The results indicate that among the six classification models constructed, the model constructed by XGBoost method has the best area under receiver operating characteristic(ROC) area under curve(AUC) evaluation results. Bayesian optimization can improve the AUC of ROC to a certain extent. For the accident driver's prediction accuracy rate reaches 98.66%, the operating unit can also weigh the false positive rate and true positive rate according to its own situation. Moreover, the nonlinear influence effect of features is found in the model results. The characteristics of vehicle service time, driving age, violations, punishment and other characteristics have a very obvious role in the accident risk.

Key words: risky bus drivers, machine learning, accident risk, extreme gradient boosting(XGBoost), SHapley additive explanation (SHAP) value