基于深度强化学习的综合航电系统安全性优化方法

doi:10.16265/j.cnki.issn1003-3033.2024.07.0228

摘要/Abstract

摘要：

为解决传统基于人工检查的安全性设计方法难以应对航电系统大规模集成带来的可选驻留方案爆炸问题,构建航电系统分区模型、任务模型以及安全关键等级量化模型,将考虑安全性的综合化设计优化问题模型化为马尔可夫决策过程(MDP)问题,并提出一种基于Actor-Critic框架的柔性动作-评价(SAC)算法的优化方法;为得到SAC算法的参数选择和训练结果之间的相关性,针对算法参数灵敏度开展研究;同时,为验证基于SAC算法的优化方法在优化考虑安全性的综合化设计方面的优越性,以深度确定性策略梯度(DDPG)算法和传统分配算法为对象,开展优化对比试验。结果表明:在最佳的参数组合下,使用的SAC算法收敛后的最大奖励相较于其他参数组合提升近8%,同时,收敛时间缩短近16.6%;相较于DDPG算法和传统分配算法,基于SAC算法的优化方法在相同的参数设置下获得的最大奖励、约束累计违背率、分区均衡风险效果、分区资源利用以及求解时间方面最大提升分别为62%、7464%、8370%、2123%和775%。

关键词: 深度强化学习, 综合航电系统, 安全性, 优化方法, 马尔可夫决策过程(MDP), 综合化设计

Abstract:

To solve the problem that traditional safety design methods based on manual inspection were difficult to cope with the explosion of optional residence solutions caused by the large-scale integration of avionics systems, an avionics system partition model, task model and safety criticality level quantification model were constructed, and the comprehensive design optimization considering safety was modeled as an MDP problem. An optimization method of Soft Action-Critic (SAC) algorithm based on Actor-Critic framework was proposed. In order to obtain the correlation between the parameter selection and training results of SAC algorithm, the sensitivity of the algorithm parameters was studied. At the same time, to verify the superiority of the optimization method based on the SAC algorithm in optimizing the comprehensive design considering safety, optimization comparison experiments were carried out with the Deep Deterministic Policy Gradient (DDPG) algorithm and the traditional allocation algorithm as the objects. The results show that under the optimal parameter combination, the maximum reward after using convergence of SAC algorithm increases by nearly 8% compared with other parameter combinations, and the convergence time is shortened by nearly 16.6%. Compared with the DDPG algorithm and the traditional allocation algorithm, the optimization method based on SAC algorithm has improved approximately 62%, 7464%, 8370%, 2123% and 775% in terms of the maximum reward, cumulative constraint violation rate, partition balance risk effect, partition resource utilization and solution time

Key words: deep reinforcement learning, integrated modular avionics, safety, Markov decision process (MDP), integrated design

中图分类号:

X949

赵长啸, 李道俊, 孙亦轩, 景鹏, 田毅. 基于深度强化学习的综合航电系统安全性优化方法[J]. 中国安全科学学报, 2024, 34(7): 123-131.

ZHAO Changxiao, LI Daojun, SUN Yixuan, JING Peng, TIAN Yi. Integrated avionics system safety optimization method based on deep reinforcement learning[J]. China Safety Science Journal, 2024, 34(7): 123-131.

图/表 9

图1

图2

表1

表2

表3

表4

图3

图4

表5

参考文献 19

[1]	WANG Hongli, ZHONG Deming, ZHAO Tingdi, et al. Integrating model checking with SysML in complex system safety analysis[J]. IEEE Access, 2019, 7: 16 561-16 571.
[2]	赵长啸, 汪克念, 张伟, 等. 民机航电系统功能-信息安全一体化分析方法[J]. 中国安全科学学报, 2022, 32(9):49-56. doi: 10.16265/j.cnki.issn1003-3033.2022.09.2126
	ZHAO Changxiao, WANG Kenian, ZHANG Wei, et al. Integrated analysis method of function safety and cyber security of avionics system for civil aircraft[J]. China Safety Science Journal, 2022, 32(9): 49-56. doi: 10.16265/j.cnki.issn1003-3033.2022.09.2126
[3]	SAE ARP4761A, Guidelines for conducting the safety assessment process on civil aircraft, systems, and equipment[S]. 2023.
[4]	KHAMVILAI T, SUTTER L, BAUFRETON P, et al. Decentralized task reallocation on parallel computing architectures targeting an avionics application[J]. Journal of Optimization Theory and Applications, 2021, 191(2/3): 874-898.
[5]	LU Hui, ZHOU Qianlin, FEI Zongming, et al. Scheduling based on interruption analysis and PSO for strictly periodic and preemptive partitions in integrated modular avionics[J]. IEEE Access, 2018, 6: 13 523-13 540.
[6]	ZHOU Tianran, XIONG Huagang, ZHANG Zhen. Hierarchical resource allocation for integrated modular avionics systems[J]. Journal of Systems Engineering and Electronics, 2011, 22(5): 780-787. doi: 10.3969/j.issn.1004-4132.2011.05.009
[7]	ZHOU Xuan, XIONG Huagang, HE Feng. Hybrid partition-and network-level scheduling design for distributed integrated modular avionics systems[J]. Chinese Journal of Aeronautics, 2020, 33(1): 308-323.
[8]	POLYDOROS A S, NALPANTIDIS L. Survey of model-based reinforcement learning: applications on robotics[J]. Journal of Intelligent & Robotic Systems, 2017, 86(2): 153-173.
[9]	LI Dong, ZHAO Dongbin, ZHANG Qichao, et al. Reinforcement learning and deep learning based lateral control for autonomous driving[J]. IEEE Computational Intelligence Magazine, 2019, 14(2): 83-98. doi: 10.1109/MCI.2019.2901089
[10]	BARRETT E, HOWLEY E, DUGGAN J. Applying reinforcement learning towards automating resource allocation and application scalability in the cloud[J]. Concurrency and Computation: Practice and Experience, 2013, 25(12): 1656-1674.
[11]	魏明, 孙雅茹, 孙博, 等. 基于深度强化学习的无人机线路及航迹协同规划[J]. 中国安全科学学报, 2023, 33(8):68-76. doi: 10.16265/j.cnki.issn1003-3033.2023.08.0038
	WEI Ming, SUN Yaru, SUN Bo, et al. UAV distribution route and flight path collaborative planning based on deep reinforcement learning[J]. China Safety Science Journal, 2023, 33(8): 68-76. doi: 10.16265/j.cnki.issn1003-3033.2023.08.0038
[12]	BARON C, LOUIS V. Towards a continuous certification of safety-critical avionics software[J]. Computers in Industry, 2021: DOI: 10.1016/j.compind.2020.103382.
[13]	GAO Yuan, LIU Hu, TIAN Yongliang. Inverse design of mission success space for combat aircraft contribution evaluation[J]. Chinese Journal of Aeronautics, 2020, 33(8): 2189-2203.
[14]	GAO Yuan, TIAN Yongliang, LIU Hu, et al. Entropy based inverse design of aircraft mission success space in system-of-systems confrontation[J]. Chinese Journal of Aeronautics, 2021, 34(12): 99-109.
[15]	赵长啸, 何锋, 阎芳, 等. 面向风险均衡的AFDX虚拟链路路径寻优算法[J]. 航空学报, 2018, 39(1):261-272.
	ZHAO Changxiao, HE Feng, YAN Fang, et al. Path optimization algorithm of AFDX virtual link to balance the network risk[J]. Acta Aeronautica et Astronautica Sinica, 2018, 39(1): 261-272.
[16]	赵长啸, 李道俊, 汪鹏辉, 等. 基于DDPG的综合化航电系统多分区任务分配优化方法[J]. 电讯技术, 2024, 64(1):58-66.
	ZHAO Changxiao, LI Daojun, WANG Penghui, et al. A DDPG-based optimization method for multi-partition task assignment of IMA[J]. Telecommunication Engineering, 2024, 64(1): 58-66.
[17]	PUTERMAN M L. Markov decision processes[J]. Handbooks in Operations Research and Management Science, 1990, 2: 331-434.
[18]	HAARNOJA T, ZHOU Aurick, ABBEEL P, et al. Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor[C]. Proceeding in International Conference on Machine Learning, PMLR, 2018: 1861-1870.
[19]	付宇鹏, 邓向阳, 朱子强, 等. 基于模仿强化学习的固定翼飞机姿态控制器[J]. 海军航空大学学报, 2022, 37(5):393-399.
	FU Yupeng, DENG Xiangyang, ZHU Ziqiang, et al. Imitation reinforcement learning based attitude controller for fixed-wing aircraft[J]. Journal of Naval Aviation University, 2022, 37(5): 393-399.

分区ID	核心处理能力/GHz	内存资源/kb
P₁	16	128
P₂	8	256
P₃	9.6	128
P₄	9.6	128
P₅	12.8	64
P₆	11.2	512
P₇	16	128
P₈	16	256

任务 ID	组1		组2		组3
任务 ID	处理能力/GHz	内存/ kb	处理能力/GHz	内存/ kb	处理能力/GHz	内存/ kb
T₁	7.2	90	3.7	35	5.1	60
T₂	7.7	50	4.0	60	1.0	50
T₃	3.1	35	3.9	50	5.6	30
T₄	7.3	60	1.7	55	4.2	35
T₅	5.4	50	1.4	45	7.4	65
T₆	8.3	30	1.3	60	7.6	35
T₇	5.9	35	4.3	70	2.7	50
T₈	6.7	65	2.0	40	1.6	35
T₉	2.8	55	7.7	30	4.1	40
T₁₀	5.4	95	2.8	55	4.9	70
T₁₁	9.2	70	1.6	50	9.5	30
T₁₂	4.1	40	7.6	35	4.8	75
T₁₃	1.5	35	3.0	60	4.8	60
T₁₄	6.9	50	8.9	80	1.7	50
T₁₅	2.6	60	4.2	55	3.8	60

任务 ID	所属功能的失效状态数量			任务ID	所属功能的失效状态数量
任务 ID	Ⅰ类	Ⅱ类	Ⅲ类	任务ID	Ⅰ类	Ⅱ类	Ⅲ类
T₁	1	5	4	T₉	1	0	3
T₂	0	4	5	T₁₀	3	0	5
T₃	2	1	4	T₁₁	2	3	0
T₄	2	4	2	T₁₂	0	4	5
T₅	0	1	3	T₁₃	1	0	4
T₆	2	0	5	T₁₄	2	1	4
T₇	2	5	6	T₁₅	2	2	3
T₈	1	0	4	—	—	—	—

任务 ID	安全关键等级	任务 ID	安全关键等级	任务 ID	安全关键等级
T₁	54	T₆	55	T₁₁	65
T₂	25	T₇	81	T₁₂	25
T₃	59	T₈	29	T₁₃	29
T₄	72	T₉	28	T₁₄	59
T₅	8	T₁₀	80	T₁₅	63

算法选择		综合化设计优化效果
算法选择		最大奖励	约束累计违背率/%	分区风险标准差	分区资源利用率标准差	求解时间/s
深度强化学习	SAC	-59.73	67.91	32.85	16.83	7.69
深度强化学习	DDPG	-96.52	5 137.35	123.74	90.27	67.33
传统分配算法	最优适应	—	—	2 286.5	374.23	2.3
传统分配算法	循环首次适应	—	—	2 782.3	17.09	1.7