中国安全科学学报 ›› 2025, Vol. 35 ›› Issue (7): 192-200.doi: 10.16265/j.cnki.issn1003-3033.2025.07.1025

• 公共安全 • 上一篇    下一篇

基于K-means和LCA的自动驾驶交通事故聚类分析

乔剑锋1(), 王亚楠1, 吕淑然1, 王汀1, 夏学锋2   

  1. 1 首都经济贸易大学 管理工程学院, 北京 100070
    2 中国石油集团测井有限公司 国际公司, 北京 100101
  • 收稿日期:2025-03-04 修回日期:2025-05-09 出版日期:2025-07-28
  • 作者简介:

    乔剑锋 (1977—),男,内蒙古呼和浩特人,博士,副教授,主要从事安全数据文本挖掘以及安全风险预警和评价方面的研究。E-mail:

    吕淑然 教授

    王汀 副教授

Cluster analysis of autonomous driving traffic accidents based on K-means and LCA

QIAO Jianfeng1(), WANG Yanan1, LYU Shuran1, WANG Ting1, XIA Xuefeng2   

  1. 1 School of Management Engineering, Capital University of Economics and Business, Beijing 100070, China
    2 International Branch, China Petroleum Corporation Logging Limited Company, Beijing 100101, China
  • Received:2025-03-04 Revised:2025-05-09 Published:2025-07-28

摘要: 为了深入挖掘自动驾驶汽车(AV)道路交通事故的内在规律,仅依靠单一事故描述因素的统计分析是不够的,还需要进一步挖掘由多个因素相互作用所体现的综合潜在类别。鉴于AV事故数据既包含结构化信息,又包含叙事文本的特点,在类型识别过程中创新性地提出将K-means聚类分析与潜在类别分析(LCA)相结合的方法,首先,使用K-means方法从叙事文本中提取关键信息;然后,将其作为LCA模型的输入,克服LCA仅能利用现有事故报告中的结构化信息这一局限性;最后,采用美国加利福尼亚州的437起AV交通事故验证组合方法的有效性。结果表明:AV事故主要表现为4个综合类型;K-means 与 LCA 的组合方法能对含叙述文本的结构化信息实施高效的聚类分析。

关键词: K-means, 潜在类别分析(LCA), 自动驾驶, 聚类分析, 自动驾驶汽车(AV), 交通事故

Abstract:

To deeply explore the underlying patterns of road traffic accidents involving Autonomous Vehicles (AV), relying solely on the statistical analysis of individual accident description factors was insufficient. It was necessary to uncover further the comprehensive latent categories reflected by the interactions of multiple factors. Given that AV accident data contained structured information and narrative text, an innovative approach was proposed for type identification combining K-means clustering analysis and LCA. Specifically, the K-means method was used to extract key information from the narrative text, which was then fed into the LCA model to overcome the limitation of LCA being able to utilize only the structured information in existing accident reports. The effectiveness of this combined approach was verified using 437 AV traffic accidents in California, USA. The results show that AV accidents mainly manifest in four comprehensive types. The combined approach of K-means and LCA enables efficient clustering analysis of structured information that includes narrative text.

Key words: K-means, latent class analysis (LCA), autonomous driving, cluster analysis, autonomous vehicles (AV), traffic accidents

中图分类号: