中国安全科学学报 ›› 2025, Vol. 35 ›› Issue (1): 112-119.doi: 10.16265/j.cnki.issn1003-3033.2025.01.0540

• 安全工程技术 • 上一篇    下一篇

用于自动驾驶测试的车辆危险切入策略研究

周扬1,2(), 陈运星2,3,**(), 吴玲1   

  1. 1 西安航空学院 车辆工程学院,陕西 西安 710077
    2 湖北文理学院 纯电动汽车动力系统设计与测试湖北省重点实验室,湖北 襄阳 441053
    3 湖北文理学院 汽车与交通工程学院,湖北 襄阳 441053
  • 收稿日期:2024-08-11 修回日期:2024-10-20 出版日期:2025-01-28
  • 通信作者:
    **陈运星(1987—),男,湖北荆门人,博士,副教授,主要从事驾驶行为感知、智能驾驶技术等方面的研究。E-mail:
  • 作者简介:

    周扬 (1989—),男,陕西汉中人,博士,副教授,主要从事人-车-路系统安全、自动驾驶测试等方面的研究。E-mail:

    陈运星 副教授

    吴玲 副教授

  • 基金资助:
    国家自然科学基金资助(51908054); 陕西省科技厅自然科学基础研究计划项目(2024JC-YBMS-301); 湖北省技术创新计划科技重大项目(2024BAA011); 纯电动汽车动力系统设计与测试湖北省重点实验室开放基金资助(ZDSYS202310)

Research on vehicle hazardous cut-in strategy used in autonomous driving test

ZHOU Yang1,2(), CHEN Yunxing2,3,**(), WU Ling1   

  1. 1 School of Vehicle Engineering, Xi'an Aeronautical Institute, Xi'an Shaanxi 710077, China
    2 Hubei Key Laboratory of Power System Design and Test for Electrical Vehicle, Hubei University of Arts and Science, Xiangyang Hubei 441053, China
    3 School of Automotive and Traffic Engineering, Hubei University of Arts and Science, Xiangyang Hubei 441053, China
  • Received:2024-08-11 Revised:2024-10-20 Published:2025-01-28

摘要:

为提高车辆切入测试场景中交通车的交互能力,提出一种基于深度强化学习方法的车辆危险切入策略设计方法。首先,基于可扩展多智能体强化学习培训学校(SMARTS)仿真平台构建仿真环境;然后,采用双延迟深度确定性策略梯度算法(TD3)训练智能体危险切入随机选定的目标车辆,将该算法与近端策略优化算法(PPO)和深度确定性策略梯度算法(DDPG)进行对比,在7种不同车辆密度的场景中测试训练后的模型;最后,构建多智能体测试环境,将所训练模型用于智能驾驶策略的验证。结果表明: 模型在训练中的危险切入成功率达80.35%,优于2种对比方法;在模型测试中,除2 700辆/h测试场景外,该模型在另外3个未在训练中使用的测试场景均达到80%以上的危险切入成功率,显示出良好的泛化能力。同时,切入时刻与目标车的碰撞时间值显示95%集中在0~6 s,取值在(0,2]、(2,4]和(4,6]s的占比分别为60%、30%和5%,可覆盖具有不同碰撞风险的测试工况。在智能驾驶策略验证中,采用所训练模型控制的交通车能主动切入至待测车辆前方,使待测车辆面临追尾风险,有助于发现智能驾驶策略的安全隐患。

关键词: 自动驾驶, 车辆危险切入, 虚拟测试, 危险场景, 强化学习

Abstract:

To improve the interaction ability of traffic vehicles in the cut-in scenario, a method for constructing a vehicle hazardous cut-in strategy based on deep reinforcement learning was proposed. Firstly, a simulated environment was built based on scalable multi-agent reinforcement learning training school(SMARTS) simulation platform. Then, twin delayed deep deterministic policy gradients (TD3) algorithm was adopted to train an agent to cut in a randomly chosen target vehicle hazardously. The algorithm was compared with proximal policy optimization (PPO) and deep deterministic policy gradient (DDPG) algorithms. The trained model was tested in seven different scenarios with varying traffic densities. Finally, a multi-agent testing environment was built, and the trained model was applied to validate intelligent driving strategies. The results show that the success rate of hazardous cut-ins reaches 80.35% in model training with TD3 algorithm, outperforming both comparative methods. In model testing, except for the 2 700 vehicle/h test scenario, the model achieves a hazardous cut-in success rate of over 80% in the other three test scenarios that were not used in training, demonstrating good generalization ability. Meanwhile, the time to collision values between the ego vehicle and the target vehicle at the moment of lane changes are concentrated within the range of 0 to 6 seconds, with 95% falling within this bracket. The proportions of time to collision values in the intervals of (0,2], (2,4], (4,6]s are 60%, 30%, and 5% respectively, covering test conditions with different collision risk. In the validation of intelligent driving strategies, the traffic vehicle controlled by the trained model can actively perform cut-ins in front of the test vehicles, exposing it to the risk of a rear-end collision and helping in identifying safety vulnerabilities in intelligent driving strategies.

Key words: autonomous driving, vehicle hazardous cut-in, virtual tests, hazardous scenarios, reinforcement learning

中图分类号: