工业场景下人员行为的多模态信息融合决策策略

doi:10.16265/j.cnki.issn1003-3033.2025.08.0084

中国安全科学学报 ›› 2025, Vol. 35 ›› Issue (8): 84-92.doi: 10.16265/j.cnki.issn1003-3033.2025.08.0084

工业场景下人员行为的多模态信息融合决策策略

王海泉¹(), 于浩玮², 杨岳毅¹, 徐晓滨³, 卜祥洲⁴, KURKOVA P⁵

¹ 中原工学院智能感知与仪器学院, 河南郑州 450007
² 中原工学院自动化与电气工程学院, 河南郑州 450007
³ 杭州电子科技大学自动化学院, 浙江杭州 310018
⁴ 河南宏博测控技术有限公司, 河南郑州 450040
⁵ 圣彼得堡国立宇航与仪器制造大学电子与激光仪器系, 俄罗斯圣彼得堡 14-51

收稿日期:2025-03-01 修回日期:2025-05-13 出版日期:2025-08-28
作者简介:
王海泉 (1981—),男,河南郑州人,博士,教授,硕士生导师,主要从事多模态数据挖掘技术、故障诊断等方面的研究。E-mail: wanghq@zut.edu.cn。
杨岳毅，讲师
徐晓滨，教授
卜祥洲，高级工程师
KURKOVA P，教授
基金资助:
河南重点研发专项项目(251111211600); 河南省科技攻关项目(242102320215); 河南省高端外国专家引进计划项目(HNGD2024032); 中原工学院学科实力提升计划项目(GG202412)

Multimodal information fusion decision-making strategy for personnel behavior in industrial scene

WANG Haiquan¹(), YU Haowei², YANG Yueyi¹, XU Xiaobin³, BU Xiangzhou⁴, KURKOVA P⁵

¹ College of Intelligent Sensing and Instrumentation, Zhongyuan University of Technology, Zhengzhou Henan 450007, China
² School of Automation and Electrical Engineering, Zhongyuan University of Technology, Zhengzhou Henan 450007, China
³ College of Automation, Hangzhou Dianzi University, Hangzhou Zhejiang 310018, China
⁴ Henan Hongbo Measurment and Control Co., Ltd., Zhengzhou Henan 450040, China
⁵ School of Electronic and Laser Instrument, Saint Petersburg State University of Aerospace Instrumentation, Saint Petersburg 14-51, Russia

Received:2025-03-01 Revised:2025-05-13 Published:2025-08-28

摘要/Abstract

摘要：

为预防工业场景下人员不安全生产行为所导致的安全事故,解决光线不佳、视野受限和遮挡等干扰情况下单一视觉模态动作识别效果不佳的问题,提出一种基于自适应证据推理(S-ER)策略,融合视频信息和惯性测量元件(IMU)信息的人员不安全行为决策方法。首先,构建基于注意力机制的多任务三维卷积模型(M-C3D),分析视频信息,运用融合注意力机制的一维卷积神经网络(1D-CNN)处理IMU信息;其次,运用证据推理(ER)策略实现决策级融合,并通过萤火虫优化算法构建不同环境条件下证据权重和可靠度的优化集合,确保视频和传感器模态信息的权重能够根据环境情况自适应调整;最后,通过德克萨斯大学达拉斯分校的多模态人员行为数据集(UTD-MHAD)和中原工学院的多模态人员行为数据集(ZUT-MHAD)验证模型的有效性。结果表明: 在存在干扰的工业场景中,S-ER 方法的识别准确率最高可达 98.53%,较传统多模态融合方法及单模态识别方法的最高值提升17.52%。

关键词: 工业场景, 多模态信息, 信息融合, 行为识别, 证据推理(ER)策略

Abstract:

In order to reduce the accidents in industrial scenarios which were caused by workers'unsafe operation behaviors, meanwhile improve the performance of visual-based action recognition methods in industrial scene with poor lighting, limited field of view and occlusions, an improved decision-making strategy based on self-adaptive ER (S-ER) was introduced in this paper. This strategy could integrate video information and inertial measurement unit (IMU) information effectively. It firstly analyzed video information and IMU information with attention mechanism-based multi-task convolutional 3D (M-C3D) model as well as one-dimensional convolutional neural network (1D-CNN) fused with attention mechanism, then ER theory was introduced to achieve decision-level fusion, where the set of evidence weights and reliability under different environmental conditions was optimized through the firefly optimization algorithm for improving the recognition accuracy and robustness of the model. The effectiveness of the proposed algorithm was verified on the public dataset Multimodal Human Action Dataset from University of Texas at Dallas(UTD-MHAD) and the self-built dataset Multimodal Human Action Dataset from Zhongyuan University of Technology(ZUT-MHAD). The results show that the identification results of S-ER for workers'unsafe behaviors in complex industrial scenarios can reach up to 98.53%, which is 17.52% higher than the maximum value of traditional multimodal fusion methods and single-modality recognition methods.

Key words: industrial scene, multimodal information, information fusion, action recognition, evidence reasoning (ER) theory

中图分类号:

X928.03事故预防与预测

王海泉, 于浩玮, 杨岳毅, 徐晓滨, 卜祥洲, KURKOVA P. 工业场景下人员行为的多模态信息融合决策策略[J]. 中国安全科学学报, 2025, 35(8): 84-92.

WANG Haiquan, YU Haowei, YANG Yueyi, XU Xiaobin, BU Xiangzhou, KURKOVA P. Multimodal information fusion decision-making strategy for personnel behavior in industrial scene[J]. China Safety Science Journal, 2025, 35(8): 84-92.

图/表 11

图1

图2

图3

图4

图5

图6

图7

表1

图8

图9

表2

参考文献 31

[1]	纪执安, 周云奕, 张玉媛, 等. 基于改进YOLOv5的工业现场不安全行为检测[J]. 中国安全科学学报, 2024, 34(7):38-43. doi: 10.16265/j.cnki.issn1003-3033.2024.07.2030
	JI Zhian, ZHOU Yunyi, ZHANG Yuyuan, et al. Industrial site unsafe behavior detection based on improved YOLOv5[J]. China Safety Science Journal, 2024, 34(7): 38-43. doi: 10.16265/j.cnki.issn1003-3033.2024.07.2030
[2]	ZHAO Jiajun, LIU Zhiqiang, XIE Sijia, et al. Research on the application of body posture action feature extraction and recognition comparison[J]. IET Image Processing, 2023, 17(1): 104-117.
[3]	刘耀, 焦双健. ST-GCN在建筑工人不安全动作识别中的应用[J]. 中国安全科学学报, 2022, 32(4):30-35. doi: 10.16265/j.cnki.issn1003-3033.2022.04.005
	LIU Yao, JIAO Shuangjian. Application of ST-GCN in unsafe action identification of construction workers[J]. China Safety Science Journal, 2022, 32(4): 30-35. doi: 10.16265/j.cnki.issn1003-3033.2022.04.005
[4]	QIU Sen, ZHAO Hongkai, JIANG Nan, et al. Multi-sensor information fusion based on machine learning for real applications in human activity recognition: state-of-the-art and research challenges[J]. Information Fusion, 2022, 80: 241-265.
[5]	LIU Liujun, YANG Jiewen, LIN Ye, et al. 3D human pose estimation with single image and inertial measurement unit (IMU) sequence[J]. Pattern Recognition, 2024: DOI: 10.1016/j.patcog.2023.110175.
[6]	孙晴, 杨超宇. 基于多模态的井下登高作业专人扶梯检测方法[J]. 工矿自动化, 2024, 50(5):142-150.
	SUN Qing, YANG Chaoyu. A multi-modal detection method for holding ladders in underground climbing operations[J]. Journal of Mine Automation, 2024, 50(5): 142-150.
[7]	DAI Zhuangzhuang, PARK J, KASZOWSKA A, et al. Detecting worker attention lapses in human-robot interaction: an eye tracking and multimodal sensing study[C]. International Conference on Automation and Computing (ICAC), 2023: 1-6.
[8]	王宇, 于春华, 陈晓青, 等. 基于多模态特征融合的井下人员不安全行为识别[J]. 工矿自动化, 2023, 49(11):138-144.
	WANG Yu, YU Chunhua, CHEN Xiaoqing, et al. Recognition of unsafe behaviors of underground personnel based on multi modal feature fusion[J]. Journal of Mine Automation, 2023, 49(11): 138-144.
[9]	年立辉. 基于多模态融合构建的建设工程工人不安全行为识别模型[J]. 佳木斯大学学报:自然科学版, 2024, 42(9):118-120, 148.
	NIAN Lihui. A model for identifying unsafe behaviors of construction workers based on multimodal fusion[J]. Journal of Jiamusi University, 2024, 42(9): 118-121, 148.
[10]	陈佳菁. 基于WiFi和视频的多模态人体动作识别研究与实现[D]. 北京: 北京邮电大学, 2024.
	CHEN Jiajing. Research and implementation of multimodal human activity recognition based on WiFi and vision[D]. Beijing: Beijing University of Posts and Telecommunications, 2024.
[11]	何俊, 张彩庆, 李小珍, 等. 面向深度学习的多模态融合技术研究综述[J]. 计算机工程, 2020, 46(5):1-11. doi: 10.19678/j.issn.1000-3428.0057370
	HE Jun, ZHANG Caiqing, LI Xiaozhen, et al. Survey of research on multimodal fusion technology for deep learning[J]. Computer Engineering, 2020, 46(5):1-11. doi: 10.19678/j.issn.1000-3428.0057370
[12]	SHAFER G. Dempster-shafer theory[J]. Encyclopedia of Artificial Intelligence, 1992,1:330-331.
[13]	ZHANG Guoliang, JIA Songmin, LI Xiuzhi, et al. Weighted score-level feature fusion based on Dempster-Shafer evidence theory for action recognition[J]. Journal of Electronic Imaging, 2018, 27(1): DOI: 10.1117/1.jei.27.1.013021.
[14]	YANG Jianbo, XU Dongling. Evidential reasoning rule for evidence combination[J]. Artificial Intelligence, 2013, 205: 1-29.
[15]	XU Xiaobin, HUANG Weidong, ZHANG Xuelin, et al. An evidential reasoning-based information fusion method for fault diagnosis of ship rudder[J]. Ocean Engineering, 2025, 318: DOI: 10.1016/j.oceaneng.2024.120082.
[16]	FAN Xuecheng, XU Zeshui. Double-level multi-attribute group decision-making method based on intuitionistic fuzzy theory and evidence reasoning[J]. Cognitive Computation, 2023, 15: 838-855.
[17]	NABOT A. A correction: utilizing evidential reasoning (ER) approach for software components selection[J]. SN Computer Science, 2025, 295(6): DOI: 10.1007/s42979-025-03878-6.
[18]	XIE Jiming, ZHANG Yan, LI Ke, et al. Inspiration in human reasoning logic: automating the inference and analysis of traffic accident information via macro-micro integration[J]. Traffic Injury Prevention, 2025: DOI: 10.1080/15389588.2025.2486572.
[19]	王树畅. 基于证据推理规则的锂离子电池健康状态评估方法研究[D]. 哈尔滨: 哈尔滨师范大学, 2024.
	WANG Shuchang. Research on health state assessment method of lithium-ion batteries based on the evidence reasoning rule[D]. Harbin: Harbin Normal University, 2024.
[20]	赖尉文, 贺维. 一种基于自适应证据推理规则的集成学习方法[J]. 计算机应用研究, 2023, 40(8):2281-2285, 2 297.
	LAI Weiwen, HE Wei. Ensemble learning method based on adaptive evidential reasoning rule[J]. Application Research of Computers, 2023, 40(8): 2281-2285, 2 297.
[21]	赵蕊蕊, 孙建彬, 游雅倩, 等. 动态ER Rule分类器构建与应用[J]. 系统工程理论与实践, 2022, 42(8):2258-2276. doi: 10.12011/SETP2021-2338
	ZHAO Ruirui, SUN Jianbin, YOU Yaqian, et al, Construction and application of dynamic classifier based on evidential reasoning rule[J]. Systems Engineering-Theory & Practice, 2022, 42(8): 2258-2276.
[22]	YANG Xinshe. Firefly algorithm, stochastic test functions and design optimisation[J]. International Journal of Bio-inspired Computation, 2010, 2(2): 78-84.
[23]	JI Shuiwang, XU Wei, YANG Ming, et al. 3D convolutional neural networks for human action recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2012, 35(1): 221-231.
[24]	TRAN D, BOURDEV L, FERGUS R, et al. Learning spatiotemporal features with 3D convolutional networks[C]. Proceedings of the IEEE International Conference on Computer Vision, 2015: 4489-4497.
[25]	WANG Qilong, WU Banggu, ZHU Pengfei, et al. ECA-Net: efficient channel attention for deep convolutional neural networks[C]. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 11 534-11 542.
[26]	WANG Huan, LIU Zhiliang, PENG Dandan, et al. Understanding and learning discriminant features based on multiattention 1DCNN for wheelset bearing fault diagnosis[J]. IEEE Transactions on Industrial Informatics, 2019, 16(9): 5735-5745.
[27]	CHEN Chen, JAFARI R, KEHTARNAVAZ N. UTD-MHAD: a multimodal dataset for human action recognition utilizing a depth camera and a wearable inertial sensor[C]. 2015 IEEE International Conference on Image Processing (ICIP), 2015: 168-172.
[28]	LIN Fang, WANG Zhelong, ZHAO Hongyu, et al. Adaptive multi-modal fusion framework for activity monitoring of people with mobility disability[J]. IEEE Journal of Biomedical and Health Informatics, 2022, 26(8): 4314-4324.
[29]	GAN Lipeng, CAO Runze, LI Ning, et al. Focal channel knowledge distillation for multi-modality action recognition[J]. IEEE Access, 2023,11: 78 285-78 298.
[30]	IMRAN J, RAMAN B. Evaluating fusion of RGB-D and inertial sensors for multimodal human action recognition[J]. Journal of Ambient Intelligence and Humanized Computing, 2020, 11(1): 189-208.
[31]	ZHONG Zhuokun, HOU Zhenjie, LIANG Jiuzhen, et al. Multimodal cooperative self-attention network for action recognition[J]. IET Image Processing, 2023, 17(6): 1775-1783.

数据集		模态信息	准确率/%	方法
UTD-MHAD	SS	D,S	90.0	多模态协同自注意力动作识别网络(Multimodal Cooperative Self-attention Network for Action Recognition,MGAT)^[31]
	SS	D,S,V	96.0	焦点通道知识蒸馏的多模态行为识别方法 (Focal Channel Knowledge Distillation for Multi- Modality Action Recognition,FCKD)^[29]
	SG	V,I	98.60	S-ER
		D,I	97.20	常州大学综合多模态人员识别方法(Changzhou University: Comprehensive Multi-modal Human Action, CZU-MHAD )^[27]
		V,S,I	97.90	堆叠密集流差分图像(Stacked Dense Flow Difference Image,SDFD)^[30]
		V,I	98.75	新型监督式自适应多模态融合方法(New Supervised Adaptive Multi-modal Fusion Method, AMFM)^[28]
		V,I	99.50	S-ER

环境类别	M-C3D 准确率/%	1D-CNN 准确率/%	M-C3D 权重	1D-CNN 权重
光线良好	90.00	82.60	0.85	0.55
光线不佳	83.51	82.60	0.78	0.64
有遮挡时	77.80	82.90	0.55	0.67

工业场景下人员行为的多模态信息融合决策策略

Multimodal information fusion decision-making strategy for personnel behavior in industrial scene

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 11

参考文献 31

相关文章 15

编辑推荐

Metrics

本文评价

[1]	李光华, 刘名. 基于信息融合技术的水电站应急指挥体系研究[J]. 中国安全科学学报, 2024, 34(S1): 280-284.
[2]	王超, 徐楚昕, 董杰, 王志锋. 基于ST-GCN的空中交通管制员不安全行为识别[J]. 中国安全科学学报, 2023, 33(5): 42-48.
[3]	范冰倩, 董秉聿, 王彪, 李铭, 吴松, 佟瑞鹏. 基于深度学习的地铁施工作业人员不安全行为识别与应用[J]. 中国安全科学学报, 2023, 33(1): 41-47.
[4]	黄珍珍, 肖硕, 王钰, 陈伟, 王升志, 江海峰. 铁路工人人体行为识别模型[J]. 中国安全科学学报, 2022, 32(6): 17-22.
[5]	冯巩, 夏元友, 王智德, 严敏嘉. 基于位移信息融合的露天矿边坡动态预警方法[J]. 中国安全科学学报, 2022, 32(3): 116-122.
[6]	杨黎霞, 许茂增, 陈仁祥. 数据样本有限的交通恐怖袭击行为识别^*[J]. 中国安全科学学报, 2021, 31(8): 30-37.
[7]	徐丹, 代勇, 纪军红. 基于卷积神经网络的驾驶人行为识别方法研究[J]. 中国安全科学学报, 2019, 29(10): 12-17.
[8]	王乾坤, 年春光, 杨冬, 张雨峰. 基于T-S模糊神经网络的地铁深基坑安全预警[J]. 中国安全科学学报, 2018, 28(8): 161-167.
[9]	郭瑞，徐广璐. 基于信息融合与GA-SVM的煤矿瓦斯浓度多传感器预测模型研究[J]. 中国安全科学学报, 2013, 23(9): 33-.
[10]	王云飞，梁伟，张来斌. 多层融合的管道泄漏诊断技术研究[J]. 中国安全科学学报, 2013, 23(8): 171-.
[11]	刘志强，於以辰，汪澎. 驾驶员注视行为模式识别技术研究[J]. 中国安全科学学报, 2013, 23(6): 80-.
[12]	程文冬，付锐，袁伟，郭应时. 驾驶人疲劳监测预警技术研究与应用综述[J]. 中国安全科学学报, 2013, 23(1): 155-.
[13]	张鑫，隋金雪，张岩. 信息融合技术在火灾探测中的应用研究[J]. 中国安全科学学报, 2011, 21(6): 94-.
[14]	秦洪懋，刘志强，汪澎. 基于多通道信息融合的疲劳驾驶行为分析研究[J]. 中国安全科学学报, 2011, 21(2): 115-.
[15]	何华刚，吕奎，王耀琪. 基于Agent的长输管道应急指挥决策系统设计[J]. 中国安全科学学报, 2011, 21(11): 164-.