Multimodal information fusion decision-making strategy for personnel behavior in industrial scene

doi:10.16265/j.cnki.issn1003-3033.2025.08.0084

Abstract

Abstract:

In order to reduce the accidents in industrial scenarios which were caused by workers'unsafe operation behaviors, meanwhile improve the performance of visual-based action recognition methods in industrial scene with poor lighting, limited field of view and occlusions, an improved decision-making strategy based on self-adaptive ER (S-ER) was introduced in this paper. This strategy could integrate video information and inertial measurement unit (IMU) information effectively. It firstly analyzed video information and IMU information with attention mechanism-based multi-task convolutional 3D (M-C3D) model as well as one-dimensional convolutional neural network (1D-CNN) fused with attention mechanism, then ER theory was introduced to achieve decision-level fusion, where the set of evidence weights and reliability under different environmental conditions was optimized through the firefly optimization algorithm for improving the recognition accuracy and robustness of the model. The effectiveness of the proposed algorithm was verified on the public dataset Multimodal Human Action Dataset from University of Texas at Dallas(UTD-MHAD) and the self-built dataset Multimodal Human Action Dataset from Zhongyuan University of Technology(ZUT-MHAD). The results show that the identification results of S-ER for workers'unsafe behaviors in complex industrial scenarios can reach up to 98.53%, which is 17.52% higher than the maximum value of traditional multimodal fusion methods and single-modality recognition methods.

Key words: industrial scene, multimodal information, information fusion, action recognition, evidence reasoning (ER) theory

CLC Number:

X928.03事故预防与预测

WANG Haiquan, YU Haowei, YANG Yueyi, XU Xiaobin, BU Xiangzhou, KURKOVA P. Multimodal information fusion decision-making strategy for personnel behavior in industrial scene[J]. China Safety Science Journal, 2025, 35(8): 84-92.

Figures/Tables 11

Fig.1

Fig.2

Fig.3

Fig.4

Fig.5

Fig.6

Fig.7

Table 1

Fig.8

Fig.9

Table 2

References 31

[1]	纪执安, 周云奕, 张玉媛, 等. 基于改进YOLOv5的工业现场不安全行为检测[J]. 中国安全科学学报, 2024, 34(7):38-43. doi: 10.16265/j.cnki.issn1003-3033.2024.07.2030
	JI Zhian, ZHOU Yunyi, ZHANG Yuyuan, et al. Industrial site unsafe behavior detection based on improved YOLOv5[J]. China Safety Science Journal, 2024, 34(7): 38-43. doi: 10.16265/j.cnki.issn1003-3033.2024.07.2030
[2]	ZHAO Jiajun, LIU Zhiqiang, XIE Sijia, et al. Research on the application of body posture action feature extraction and recognition comparison[J]. IET Image Processing, 2023, 17(1): 104-117.
[3]	刘耀, 焦双健. ST-GCN在建筑工人不安全动作识别中的应用[J]. 中国安全科学学报, 2022, 32(4):30-35. doi: 10.16265/j.cnki.issn1003-3033.2022.04.005
	LIU Yao, JIAO Shuangjian. Application of ST-GCN in unsafe action identification of construction workers[J]. China Safety Science Journal, 2022, 32(4): 30-35. doi: 10.16265/j.cnki.issn1003-3033.2022.04.005
[4]	QIU Sen, ZHAO Hongkai, JIANG Nan, et al. Multi-sensor information fusion based on machine learning for real applications in human activity recognition: state-of-the-art and research challenges[J]. Information Fusion, 2022, 80: 241-265.
[5]	LIU Liujun, YANG Jiewen, LIN Ye, et al. 3D human pose estimation with single image and inertial measurement unit (IMU) sequence[J]. Pattern Recognition, 2024: DOI: 10.1016/j.patcog.2023.110175.
[6]	孙晴, 杨超宇. 基于多模态的井下登高作业专人扶梯检测方法[J]. 工矿自动化, 2024, 50(5):142-150.
	SUN Qing, YANG Chaoyu. A multi-modal detection method for holding ladders in underground climbing operations[J]. Journal of Mine Automation, 2024, 50(5): 142-150.
[7]	DAI Zhuangzhuang, PARK J, KASZOWSKA A, et al. Detecting worker attention lapses in human-robot interaction: an eye tracking and multimodal sensing study[C]. International Conference on Automation and Computing (ICAC), 2023: 1-6.
[8]	王宇, 于春华, 陈晓青, 等. 基于多模态特征融合的井下人员不安全行为识别[J]. 工矿自动化, 2023, 49(11):138-144.
	WANG Yu, YU Chunhua, CHEN Xiaoqing, et al. Recognition of unsafe behaviors of underground personnel based on multi modal feature fusion[J]. Journal of Mine Automation, 2023, 49(11): 138-144.
[9]	年立辉. 基于多模态融合构建的建设工程工人不安全行为识别模型[J]. 佳木斯大学学报:自然科学版, 2024, 42(9):118-120, 148.
	NIAN Lihui. A model for identifying unsafe behaviors of construction workers based on multimodal fusion[J]. Journal of Jiamusi University, 2024, 42(9): 118-121, 148.
[10]	陈佳菁. 基于WiFi和视频的多模态人体动作识别研究与实现[D]. 北京: 北京邮电大学, 2024.
	CHEN Jiajing. Research and implementation of multimodal human activity recognition based on WiFi and vision[D]. Beijing: Beijing University of Posts and Telecommunications, 2024.
[11]	何俊, 张彩庆, 李小珍, 等. 面向深度学习的多模态融合技术研究综述[J]. 计算机工程, 2020, 46(5):1-11. doi: 10.19678/j.issn.1000-3428.0057370
	HE Jun, ZHANG Caiqing, LI Xiaozhen, et al. Survey of research on multimodal fusion technology for deep learning[J]. Computer Engineering, 2020, 46(5):1-11. doi: 10.19678/j.issn.1000-3428.0057370
[12]	SHAFER G. Dempster-shafer theory[J]. Encyclopedia of Artificial Intelligence, 1992,1:330-331.
[13]	ZHANG Guoliang, JIA Songmin, LI Xiuzhi, et al. Weighted score-level feature fusion based on Dempster-Shafer evidence theory for action recognition[J]. Journal of Electronic Imaging, 2018, 27(1): DOI: 10.1117/1.jei.27.1.013021.
[14]	YANG Jianbo, XU Dongling. Evidential reasoning rule for evidence combination[J]. Artificial Intelligence, 2013, 205: 1-29.
[15]	XU Xiaobin, HUANG Weidong, ZHANG Xuelin, et al. An evidential reasoning-based information fusion method for fault diagnosis of ship rudder[J]. Ocean Engineering, 2025, 318: DOI: 10.1016/j.oceaneng.2024.120082.
[16]	FAN Xuecheng, XU Zeshui. Double-level multi-attribute group decision-making method based on intuitionistic fuzzy theory and evidence reasoning[J]. Cognitive Computation, 2023, 15: 838-855.
[17]	NABOT A. A correction: utilizing evidential reasoning (ER) approach for software components selection[J]. SN Computer Science, 2025, 295(6): DOI: 10.1007/s42979-025-03878-6.
[18]	XIE Jiming, ZHANG Yan, LI Ke, et al. Inspiration in human reasoning logic: automating the inference and analysis of traffic accident information via macro-micro integration[J]. Traffic Injury Prevention, 2025: DOI: 10.1080/15389588.2025.2486572.
[19]	王树畅. 基于证据推理规则的锂离子电池健康状态评估方法研究[D]. 哈尔滨: 哈尔滨师范大学, 2024.
	WANG Shuchang. Research on health state assessment method of lithium-ion batteries based on the evidence reasoning rule[D]. Harbin: Harbin Normal University, 2024.
[20]	赖尉文, 贺维. 一种基于自适应证据推理规则的集成学习方法[J]. 计算机应用研究, 2023, 40(8):2281-2285, 2 297.
	LAI Weiwen, HE Wei. Ensemble learning method based on adaptive evidential reasoning rule[J]. Application Research of Computers, 2023, 40(8): 2281-2285, 2 297.
[21]	赵蕊蕊, 孙建彬, 游雅倩, 等. 动态ER Rule分类器构建与应用[J]. 系统工程理论与实践, 2022, 42(8):2258-2276. doi: 10.12011/SETP2021-2338
	ZHAO Ruirui, SUN Jianbin, YOU Yaqian, et al, Construction and application of dynamic classifier based on evidential reasoning rule[J]. Systems Engineering-Theory & Practice, 2022, 42(8): 2258-2276.
[22]	YANG Xinshe. Firefly algorithm, stochastic test functions and design optimisation[J]. International Journal of Bio-inspired Computation, 2010, 2(2): 78-84.
[23]	JI Shuiwang, XU Wei, YANG Ming, et al. 3D convolutional neural networks for human action recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2012, 35(1): 221-231.
[24]	TRAN D, BOURDEV L, FERGUS R, et al. Learning spatiotemporal features with 3D convolutional networks[C]. Proceedings of the IEEE International Conference on Computer Vision, 2015: 4489-4497.
[25]	WANG Qilong, WU Banggu, ZHU Pengfei, et al. ECA-Net: efficient channel attention for deep convolutional neural networks[C]. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 11 534-11 542.
[26]	WANG Huan, LIU Zhiliang, PENG Dandan, et al. Understanding and learning discriminant features based on multiattention 1DCNN for wheelset bearing fault diagnosis[J]. IEEE Transactions on Industrial Informatics, 2019, 16(9): 5735-5745.
[27]	CHEN Chen, JAFARI R, KEHTARNAVAZ N. UTD-MHAD: a multimodal dataset for human action recognition utilizing a depth camera and a wearable inertial sensor[C]. 2015 IEEE International Conference on Image Processing (ICIP), 2015: 168-172.
[28]	LIN Fang, WANG Zhelong, ZHAO Hongyu, et al. Adaptive multi-modal fusion framework for activity monitoring of people with mobility disability[J]. IEEE Journal of Biomedical and Health Informatics, 2022, 26(8): 4314-4324.
[29]	GAN Lipeng, CAO Runze, LI Ning, et al. Focal channel knowledge distillation for multi-modality action recognition[J]. IEEE Access, 2023,11: 78 285-78 298.
[30]	IMRAN J, RAMAN B. Evaluating fusion of RGB-D and inertial sensors for multimodal human action recognition[J]. Journal of Ambient Intelligence and Humanized Computing, 2020, 11(1): 189-208.
[31]	ZHONG Zhuokun, HOU Zhenjie, LIANG Jiuzhen, et al. Multimodal cooperative self-attention network for action recognition[J]. IET Image Processing, 2023, 17(6): 1775-1783.

数据集		模态信息	准确率/%	方法
UTD-MHAD	SS	D,S	90.0	多模态协同自注意力动作识别网络(Multimodal Cooperative Self-attention Network for Action Recognition,MGAT)^[31]
	SS	D,S,V	96.0	焦点通道知识蒸馏的多模态行为识别方法 (Focal Channel Knowledge Distillation for Multi- Modality Action Recognition,FCKD)^[29]
	SG	V,I	98.60	S-ER
		D,I	97.20	常州大学综合多模态人员识别方法(Changzhou University: Comprehensive Multi-modal Human Action, CZU-MHAD )^[27]
		V,S,I	97.90	堆叠密集流差分图像(Stacked Dense Flow Difference Image,SDFD)^[30]
		V,I	98.75	新型监督式自适应多模态融合方法(New Supervised Adaptive Multi-modal Fusion Method, AMFM)^[28]
		V,I	99.50	S-ER

环境类别	M-C3D 准确率/%	1D-CNN 准确率/%	M-C3D 权重	1D-CNN 权重
光线良好	90.00	82.60	0.85	0.55
光线不佳	83.51	82.60	0.78	0.64
有遮挡时	77.80	82.90	0.55	0.67