China Safety Science Journal ›› 2023, Vol. 33 ›› Issue (8): 68-76.doi: 10.16265/j.cnki.issn1003-3033.2023.08.0038

• Safety engineering technology • Previous Articles     Next Articles

UAV distribution route and flight path collaborative planning based on deep reinforcement learning

WEI Ming1,2(), SUN Yaru1, SUN Bo1, WANG Shengjie2   

  1. 1 School of air traffic management, Civil Aviation University of China,Tianjin 300300, China
    2 Key Laboratory of Flight Techniques and Flight Safety, Civil Aviation Flight University of China, Guanghan Sichuan 618300, China
  • Received:2023-03-22 Revised:2023-06-15 Online:2023-10-08 Published:2024-02-28

Abstract:

In order to solve the logistics UAV distribution sequence and flight path collaborative planning problems, this paper proposed a bilevel programming model for the collaborative planning of UAV distribution route and flight path, where the locations of depots, customers and shelters on the ground, as well as the difference of UAVs' falling costs, were considered in a rasterized GIS(Geographic Information System). According to the characteristics of the problem, a two-stage hybrid algorithm based on deep reinforcement learning was designed. In the first stage, the deep reinforcement learning algorithm was used to generate the sequential delivery routes of multiple UAVs visiting customers, and A* algorithm was embedded in it. Based on this, the feasible shortest flight path of each UAV was searched in the second stage. Finally, an example was used to calculate the optimal UAV distribution route and its flight path scheme, and analyze the influence of parameter changes on it. Furthermore, our algorithm was further compared with the traditional intelligent algorithm to verify the effectiveness and correctness of the model and algorithm. The results show that: for the example of 30 customer points in the 6 km×6 km area, when the man-machine fall cost threshold is set to 1.4, 5 UAVs with a total flight mileage of 52.5 m are needed to complete the delivery task. Compared with a variety of traditional intelligent algorithms, the solution time is ranked as DRL, GA(Genetic Algorithm), DE(Differential Evolution) and PSO (Particle Swarm Optimization) in order from least to most. Especially for large-scale examples, the planning result of DRL has lower operating cost of UAVs, and its average solution and worst solution are far better than intelligent algorithms.

Key words: deep reinforcement learning (DRL), unmanned aerial vehicle (UAV), distribution route, flight path planning, bi-level programming model