China Safety Science Journal ›› 2023, Vol. 33 ›› Issue (2): 23-30.doi: 10.16265/j.cnki.issn1003-3033.2023.02.0034

• Safety social science and safety management • Previous Articles     Next Articles

Research on accident risk identification and influencing factors of bus drivers based on machine learning

ZHU Tong1(), QIN Dan1, WEI Wen1, REN Jie1, FENG Yidong2,**()   

  1. 1 College of Transportation Engineering, Chang'an University, Xi'an Shaanxi 710064, China
    2 Research Institute of Highway Ministry of Transport, Beijing 100088, China
  • Received:2022-10-17 Revised:2022-12-21 Online:2023-02-28 Published:2023-08-28

Abstract:

In order to identify the bus drivers who are about to cause accidents, the data set was obtained by combining the Bus Safety Management System database, the Baidu application programming interface(API) and web crawling technology. K-Nearest Neighbor algorithm was used to supplement the missing values and data from 1893 drivers in 42 lines was obtained. The basic characteristic variables included driver, vehicle, route characteristics, violations, accidents, management, and further construct derived characteristics on this basis. An integrated method, including recursive feature elimination, logistic regression with penalty terms, random forest and others, was designed and used for feature selection. The model was built using 6 machine methods like XGBoost and optimized for the hyper-parameters using Bayesian methods. The results indicate that among the six classification models constructed, the model constructed by XGBoost method has the best area under receiver operating characteristic(ROC) area under curve(AUC) evaluation results. Bayesian optimization can improve the AUC of ROC to a certain extent. For the accident driver's prediction accuracy rate reaches 98.66%, the operating unit can also weigh the false positive rate and true positive rate according to its own situation. Moreover, the nonlinear influence effect of features is found in the model results. The characteristics of vehicle service time, driving age, violations, punishment and other characteristics have a very obvious role in the accident risk.

Key words: risky bus drivers, machine learning, accident risk, extreme gradient boosting(XGBoost), SHapley additive explanation (SHAP) value