中国安全科学学报 ›› 2023, Vol. 33 ›› Issue (8): 68-76.doi: 10.16265/j.cnki.issn1003-3033.2023.08.0038

• 安全工程技术 • 上一篇    下一篇

基于深度强化学习的无人机线路及航迹协同规划

魏明1,2(), 孙雅茹1, 孙博1, 王盛杰2   

  1. 1 中国民航大学 空中交通管理学院,天津 300300
    2 中国民用航空飞行学院 民航飞行技术与飞行安全重点实验室,四川 广汉 618300
  • 收稿日期:2023-03-22 修回日期:2023-06-15 出版日期:2023-10-08
  • 作者简介:

    魏 明 (1984—),男,安徽芜湖人,博士,教授,主要从事交通规划与管理方面的研究。E-mail:

    孙 博 副教授

    王盛杰 讲师

  • 基金资助:
    教育部人文社科项目(20YJCZH176); 民航飞行技术与飞行安全重点实验室开放基金资助(FZ2021KF06)

UAV distribution route and flight path collaborative planning based on deep reinforcement learning

WEI Ming1,2(), SUN Yaru1, SUN Bo1, WANG Shengjie2   

  1. 1 School of air traffic management, Civil Aviation University of China,Tianjin 300300, China
    2 Key Laboratory of Flight Techniques and Flight Safety, Civil Aviation Flight University of China, Guanghan Sichuan 618300, China
  • Received:2023-03-22 Revised:2023-06-15 Published:2023-10-08

摘要:

为优化物流无人机(UAV)的配送线路以及航迹协同规划,在地理信息系统(GIS)栅格化基础上,考虑调度中心、客户和地面遮蔽物的位置空间分布,以及它们的坠落代价差异,提出一种UAV配送线路以及航迹协同规划的双层规划模型;根据问题特征,设计一种基于深度强化学习(DRL)的2阶段混合算法,在第1阶段利用DRL算法生成多架UAV访问客户顺序的配送线路,再将A*算法嵌入其中,据此在第2阶段搜索每架UAV的可行最短航迹;结合算例,给出最佳UAV配送线路及其航迹方案,分析参数的变化对模型的影响,并与传统智能算法比较,验证所提模型的科学性和有效性。结果表明:对于6 km×6 km区域内30客户点规模的算例,设置人机的坠落代价阈值为1.4时,完成配送任务需要5架UAV,总飞行里程52.5 km;与多种传统智能算法相比,求解时间从少到多依次排名为DRL、遗传算法(GA)、差分进化算法(DE)和粒子群算法(PSO),在大规模算例上,DRL的规划结果UAV运行成本更低,其平均解和最差解远优于智能算法。

关键词: 深度强化学习(DRL), 无人机(UAV), 配送线路, 航迹规划, 双层规划模型

Abstract:

In order to solve the logistics UAV distribution sequence and flight path collaborative planning problems, this paper proposed a bilevel programming model for the collaborative planning of UAV distribution route and flight path, where the locations of depots, customers and shelters on the ground, as well as the difference of UAVs' falling costs, were considered in a rasterized GIS(Geographic Information System). According to the characteristics of the problem, a two-stage hybrid algorithm based on deep reinforcement learning was designed. In the first stage, the deep reinforcement learning algorithm was used to generate the sequential delivery routes of multiple UAVs visiting customers, and A* algorithm was embedded in it. Based on this, the feasible shortest flight path of each UAV was searched in the second stage. Finally, an example was used to calculate the optimal UAV distribution route and its flight path scheme, and analyze the influence of parameter changes on it. Furthermore, our algorithm was further compared with the traditional intelligent algorithm to verify the effectiveness and correctness of the model and algorithm. The results show that: for the example of 30 customer points in the 6 km×6 km area, when the man-machine fall cost threshold is set to 1.4, 5 UAVs with a total flight mileage of 52.5 m are needed to complete the delivery task. Compared with a variety of traditional intelligent algorithms, the solution time is ranked as DRL, GA(Genetic Algorithm), DE(Differential Evolution) and PSO (Particle Swarm Optimization) in order from least to most. Especially for large-scale examples, the planning result of DRL has lower operating cost of UAVs, and its average solution and worst solution are far better than intelligent algorithms.

Key words: deep reinforcement learning (DRL), unmanned aerial vehicle (UAV), distribution route, flight path planning, bi-level programming model