中国安全科学学报 ›› 2024, Vol. 34 ›› Issue (7): 123-131.doi: 10.16265/j.cnki.issn1003-3033.2024.07.0228

• 安全工程技术 • 上一篇    下一篇

基于深度强化学习的综合航电系统安全性优化方法

赵长啸1,2(), 李道俊1, 孙亦轩1, 景鹏1, 田毅1,2,**()   

  1. 1 中国民航大学 安全工程与科学学院,天津 300300
    2 中国民航大学 民航航空器适航审定技术重点实验室,天津 300300
  • 收稿日期:2024-01-18 修回日期:2024-04-21 出版日期:2024-07-28
  • 通信作者:
    ** 田毅(1983—),男,陕西汉中人,硕士,副研究员,主要从事机载电子硬件适航审定、航空专用集成电路设计、计算机体系结构方面的研究。E-mail:
  • 作者简介:

    赵长啸 (1989—),男,山东临清人,博士,副教授,主要从事综合化航电系统性能评估与适航设定技术研究。E-mail:

  • 基金资助:
    国家重点研发计划项目(2021YFB1600601); 天津市高等学校研究生教育改革研究计划项目(TJYG135); 中国民航大学研究生科研创新资助项目(2023YJSKC09015)

Integrated avionics system safety optimization method based on deep reinforcement learning

ZHAO Changxiao1,2(), LI Daojun1, SUN Yixuan1, JING Peng1, TIAN Yi1,2,**()   

  1. 1 School of Safety Engineering and Science, Civil Aviation University of China, Tianjin 300300, China
    2 Key Laboratory of Civil Aviation Airworthiness Certification Technology, Civil Aviation University of China, Tianjin 300300, China
  • Received:2024-01-18 Revised:2024-04-21 Published:2024-07-28

摘要:

为解决传统基于人工检查的安全性设计方法难以应对航电系统大规模集成带来的可选驻留方案爆炸问题,构建航电系统分区模型、任务模型以及安全关键等级量化模型,将考虑安全性的综合化设计优化问题模型化为马尔可夫决策过程(MDP)问题,并提出一种基于Actor-Critic框架的柔性动作-评价(SAC)算法的优化方法;为得到SAC算法的参数选择和训练结果之间的相关性,针对算法参数灵敏度开展研究;同时,为验证基于SAC算法的优化方法在优化考虑安全性的综合化设计方面的优越性,以深度确定性策略梯度(DDPG)算法和传统分配算法为对象,开展优化对比试验。结果表明:在最佳的参数组合下,使用的SAC算法收敛后的最大奖励相较于其他参数组合提升近8%,同时,收敛时间缩短近16.6%;相较于DDPG算法和传统分配算法,基于SAC算法的优化方法在相同的参数设置下获得的最大奖励、约束累计违背率、分区均衡风险效果、分区资源利用以及求解时间方面最大提升分别为62%、7464%、8370%、2123%和775%。

关键词: 深度强化学习, 综合航电系统, 安全性, 优化方法, 马尔可夫决策过程(MDP), 综合化设计

Abstract:

To solve the problem that traditional safety design methods based on manual inspection were difficult to cope with the explosion of optional residence solutions caused by the large-scale integration of avionics systems, an avionics system partition model, task model and safety criticality level quantification model were constructed, and the comprehensive design optimization considering safety was modeled as an MDP problem. An optimization method of Soft Action-Critic (SAC) algorithm based on Actor-Critic framework was proposed. In order to obtain the correlation between the parameter selection and training results of SAC algorithm, the sensitivity of the algorithm parameters was studied. At the same time, to verify the superiority of the optimization method based on the SAC algorithm in optimizing the comprehensive design considering safety, optimization comparison experiments were carried out with the Deep Deterministic Policy Gradient (DDPG) algorithm and the traditional allocation algorithm as the objects. The results show that under the optimal parameter combination, the maximum reward after using convergence of SAC algorithm increases by nearly 8% compared with other parameter combinations, and the convergence time is shortened by nearly 16.6%. Compared with the DDPG algorithm and the traditional allocation algorithm, the optimization method based on SAC algorithm has improved approximately 62%, 7464%, 8370%, 2123% and 775% in terms of the maximum reward, cumulative constraint violation rate, partition balance risk effect, partition resource utilization and solution time

Key words: deep reinforcement learning, integrated modular avionics, safety, Markov decision process (MDP), integrated design

中图分类号: