基于深度强化学习的无人机线路及航迹协同规划

doi:10.16265/j.cnki.issn1003-3033.2023.08.0038

摘要/Abstract

摘要：

为优化物流无人机(UAV)的配送线路以及航迹协同规划,在地理信息系统(GIS)栅格化基础上,考虑调度中心、客户和地面遮蔽物的位置空间分布,以及它们的坠落代价差异,提出一种UAV配送线路以及航迹协同规划的双层规划模型;根据问题特征,设计一种基于深度强化学习(DRL)的2阶段混合算法,在第1阶段利用DRL算法生成多架UAV访问客户顺序的配送线路,再将A*算法嵌入其中,据此在第2阶段搜索每架UAV的可行最短航迹;结合算例,给出最佳UAV配送线路及其航迹方案,分析参数的变化对模型的影响,并与传统智能算法比较,验证所提模型的科学性和有效性。结果表明:对于6 km×6 km区域内30客户点规模的算例,设置人机的坠落代价阈值为1.4时,完成配送任务需要5架UAV,总飞行里程52.5 km;与多种传统智能算法相比,求解时间从少到多依次排名为DRL、遗传算法(GA)、差分进化算法(DE)和粒子群算法(PSO),在大规模算例上,DRL的规划结果UAV运行成本更低,其平均解和最差解远优于智能算法。

关键词: 深度强化学习(DRL), 无人机(UAV), 配送线路, 航迹规划, 双层规划模型

Abstract:

In order to solve the logistics UAV distribution sequence and flight path collaborative planning problems, this paper proposed a bilevel programming model for the collaborative planning of UAV distribution route and flight path, where the locations of depots, customers and shelters on the ground, as well as the difference of UAVs' falling costs, were considered in a rasterized GIS(Geographic Information System). According to the characteristics of the problem, a two-stage hybrid algorithm based on deep reinforcement learning was designed. In the first stage, the deep reinforcement learning algorithm was used to generate the sequential delivery routes of multiple UAVs visiting customers, and A* algorithm was embedded in it. Based on this, the feasible shortest flight path of each UAV was searched in the second stage. Finally, an example was used to calculate the optimal UAV distribution route and its flight path scheme, and analyze the influence of parameter changes on it. Furthermore, our algorithm was further compared with the traditional intelligent algorithm to verify the effectiveness and correctness of the model and algorithm. The results show that: for the example of 30 customer points in the 6 km×6 km area, when the man-machine fall cost threshold is set to 1.4, 5 UAVs with a total flight mileage of 52.5 m are needed to complete the delivery task. Compared with a variety of traditional intelligent algorithms, the solution time is ranked as DRL, GA(Genetic Algorithm), DE(Differential Evolution) and PSO (Particle Swarm Optimization) in order from least to most. Especially for large-scale examples, the planning result of DRL has lower operating cost of UAVs, and its average solution and worst solution are far better than intelligent algorithms.

Key words: deep reinforcement learning (DRL), unmanned aerial vehicle (UAV), distribution route, flight path planning, bi-level programming model

魏明, 孙雅茹, 孙博, 王盛杰. 基于深度强化学习的无人机线路及航迹协同规划[J]. 中国安全科学学报, 2023, 33(8): 68-76.

WEI Ming, SUN Yaru, SUN Bo, WANG Shengjie. UAV distribution route and flight path collaborative planning based on deep reinforcement learning[J]. China Safety Science Journal, 2023, 33(8): 68-76.

图/表 12

表1

REINFORCE算法流程

步骤	具体流程
1	初始化具有随机权值θ的参与者网络,以及具有随机权值φ的评论家网络
2	for迭代次数 = 1,2,…(设置迭代器)
3	重置梯度: $d θ ← 0, d φ ← 0$
4	根据概率选择生成V个问题实例
5	for v = 1,2,…,V,其中,v为实例序号
6	初始化决策计数器 $ω ← 0$
7	repeat:
8	根据 $p (y ω + 1 i \| Y ω i, S ω i)$ 选择 $y ω + 1 i$ ,构造UAV配送线路规划问题的解
9	观察 $S ω + 1 i$
10	$ω ← ω + 1$
11	until 客户集合I中所有客户点的需求均被满足
12	输入A*算法计算得到的UAV可飞路径相关参数
13	初始化UAV超出其续航能力运行记号 $ω - ← 0$
14	if $∑ ω ∈ W F (l i ω, d i j ω) ≥ p m a x k$ :
15	$ω - ← ω - + 1$
16	end if
17	计算实例v的奖励值 $δ v = - (式 (2) + ω - · M), v ∈ V$
18	end for
19	$d θ ← 1 V ∑ v = 1 V δ v - V S 0 v; φ ▽ θ l n P (Y v \| S 0 v)$
20	$d φ ← 1 V ∑ v = 1 V ▽ θ (δ v - V (S 0 v; φ)) 2$
21	根据步骤19、20梯度计算更新θ与φ
22	end for

表1

表2

表3

试验相关参数设置

UAV转子数/个	UAV框架重量/kg	UAV电池重量/kg
6	26	10
最大续航里程/km	设计速度/ (m·s^-1)	最大载重 $l k m a x$ /kg
15	16	10
固定成本 c_k/(元·架^-1)	等待时间惩罚费 c₁/(元·60 s^-1)	滞后时间惩罚费 c₂/(元·次^-1)
2000	2	100
单位距离损耗 c_d/(元·km^-1)	电价 c_p/(元·kW^-1·h^-1)	UAV统一出发时间
1.5	2	06:00

表3

图1

表4

图2

图3

表5

表6

图4

表7

表8

参考文献 10

[1]	DORLING K, HEINRICHS J, MESSIER G G, et al. Vehicle routing problems for drone delivery[J]. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2016, 47(1): 70-85. doi: 10.1109/TSMC.2016.2582745
[2]	JIANG Xiaowei, ZHOU Qiang, YE Ying. Method of task assignment for UAV based on particle swarm optimization in logistics[C]. Proceedings of the 2017 International Conference on Intelligent Systems, Metaheuristics & Swarm Intelligence, 2017: 113-117.
[3]	SONG B D, PARK K, KIM J. Persistent UAV delivery logistics: MILP formulation and efficient heuristic[J]. Computers & Industrial Engineering, 2018, 120: 418-428. doi: 10.1016/j.cie.2018.05.013
[4]	韩鹏, 张冰玉. 基于改进蚁群算法的无人机安全航路规划研究[J]. 中国安全科学学报, 2021, 31(1): 24-29. doi: 10.16265/j.cnki.issn 1003-3033.2021.01.004
	HAN Peng, ZHANG Bingyu. Safety route planning of UAV based on improved ant colony algorithm[J]. China Safety Science Journal. 2021, 31(1): 24-29. doi: 10.16265/j.cnki.issn 1003-3033.2021.01.004
[5]	NAZARI M, OROOJLOOY A, SNYDER L V, et al. Deep reinforcement learning for solving the vehicle routing problem[J]. Computer Science, 2018:DOI:10.48550/arXiv.1802.04240.
[6]	张启钱, 许卫卫, 张洪海, 等. 复杂低空物流UAV路径规划[J]. 北京航空航天大学学报, 2020, 46(7): 1275-1286.
	ZHANG Qiqian, XU Weiwei, ZHANG Honghai, et al. Path planning for logistics UAV in complex low-altitude airspace[J]. Journal of Beijing University of Aeronautics and Astronautics, 2020, 46 (7): 1275-1286.
[7]	许乐, 赵文龙. 基于新型灰狼优化算法的无人机航迹规划[J]. 电子测量技术, 2022, 45(5): 55-61.
	XU Le, ZHAO Wenlong. UAV track planning based on novel Grey Wolf optimization algorithm[J]. Electronic Measurement Technology, 2022, 45(5): 55-61.
[8]	MUÑOZ G, BARRADO C, ÇETIN E, et al. Deep reinforcement learning for drone delivery[J]. Drones, 2019, 3(3): DOI:10.3390/drones3030072.
[9]	韩鹏, 赵嶷飞. 基于飞行环境建模的UAV地面撞击风险研究[J]. 中国安全科学学报, 2020, 30(1): 142-147. doi: 10.16265/j.cnki.issn1003-3033.2020.01.022
	HAN Peng, ZHAO Yifei. Study on ground impact risk of UAV based on flight environment[J]. China Safety Science Journal, 2020, 30(1): 142-147. doi: 10.16265/j.cnki.issn1003-3033.2020.01.022
[10]	KINGMA D P, BA J. Adam: a method for stochastic optimization[J]. Computer Science, 2014:DOI: 10.48550/arXiv.1412.6980.

客户(坐标)	客户时间窗	需求/kg	客户(坐标)	客户时间窗	需求/kg
D1(0,27)	[06:03,06:08]	1	D16(14,5)	[06:01,06:06]	1
D2(1,12)	[06:03,06:08]	2	D17(15,25)	[06:05,06:10]	1
D3(1,4)	[06:05,06:10]	1	D18(16,13)	[06:00,06:05]	2
D4(4,22)	[06:02,06:07]	3	D19(17,20)	[06:04,06:09]	2
D5(5,18)	[06:05,06:10]	1	D20(18,17)	[06:05,06:10]	1
D6(6,17)	[06:04,06:09]	2	D21(21,17)	[06:00,06:05]	2
D7(7,11)	[06:06,06:11]	1	D22(22,21)	[06:03,06:08]	1
D8(8,8)	[06:05,06:10]	1	D23(23,15)	[06:04,06:09]	1
D9(9,2)	[06:02,06:07]	1	D24(24,21)	[06:02,06:07]	2
D10(9,19)	[06:00,06:05]	2	D25(25,25)	[06:03,06:08]	1
D11(10,2)	[06:03,06:08]	1	D26(26,4)	[06:00,06:05]	2
D12(10,12)	[06:01,06:06]	1	D27(27,7)	[06:03,06:08]	2
D13(10,23)	[06:04,06:09]	1	D28(27,24)	[06:02,06:07]	1
D14(12,5)	[06:02,06:07]	2	D29(28,13)	[06:03,06:08]	3
D15(13,7)	[06:00,06:05]	3	D30(29,7)	[06:02,06:07]	1

UAV	路径及时间规划	航程/ km	能耗/ (kW·h)	等待时间/s	滞后时间/s
UAV1	D0(06:00)-D25(06:03:57)-D28(06:05:30)-D24(06:06:53)-D22(06:07:18)-D19(06:08:26)-D17(06:09:38)-D13(06:10:52)-D0(06:13:40)	11.637	0.95	92	0
UAV2	D0(06:00)-D30(06:03:58)-D27(06:05:25)-D26(06:07:43)-D29(06:08:46)-D23(06:09:58)-D0(06:12:42)	10.643	0.87	97	0
UAV3	D0(06:00)-D20(06:01:41)-D21(06:02:19)-D18(06:03:41)-D11(06:06:30)-D9(06:06:43)-D14(06:07:35)-D16(06:08:35)-D0(06:10:13)	9.423	0.78	25	0
UAV4	D0(06:00)-D3(06:02:59)-D8(06:05:48)-D15(06:06:56)-D7(06:08:35)-D12(06:09:43)-D2(06:11:46)-D0(06:14:13)	12.223	1.01	89	223
UAV5	D0(06:00)-D10(06:02:01)-D4(06:03:19)-D1(06:04:42)-D5(06:07:19)-D6(06:07:36)-D0(06:09:17)	8.623	0.70	18	0
总费用:10 198.12元

有无时间窗约束	UAV数量	UAV航程/km	UAV能耗/(kW·h)	滞后时间/s	等待时间/s	总运营成本/元
无	5	48.97	7.90	0	0	10 087.36
有	5	52.55	8.59	223	321	10 198.12

R	UAV数量	UAV航程/km	UAV能耗/(kW·h)	滞后时间/s	等待时间/s	总运营成本/元
无	5	50.57	8.03	0	718	10 108.82
1.4	5	52.55	8.59	223	321	10 198.12
1.1	5	55.94	9.18	292	529	10 206.77

客户规模	算法	解的质量
客户规模	算法	最优解	平均解	最差解	标准差	求解时间/s
30	DRL	10 198.12	11 831.31	12 342.66	40.56	12.79
	GA	12 337.66	17 389.18	58 468.23	5 449.34	47.56
	DE	14 153.70	29 607.82	50 232.42	5 904.81	48.69
	PSO	10 084.57	12 914.68	42 128.75	3 356.54	890.74
50	DRL	26 983.94	28 796.38	30 673.28	342.83	28.42
	GA	26 754.88	53 133.30	94 844.94	10 935.59	84.73
	DE	38 706.89	87 836.36	120 707.64	13 384.37	87.65
	PSO	44 165.52	115 964.70	224 003.45	25 473.53	1 680.83
100	DRL	41 657.08	48 641.84	55 687.67	810.69	40.28
	GA	60 321.98	89 955.07	243 263.89	35 319.22	140.41
	DE	87 507.85	63 267.91	209 413.78	17 814.59	200.48
	PSO	88 987.01	202 019.22	337 613.34	27 817.30	3 598.32