China Safety Science Journal ›› 2025, Vol. 35 ›› Issue (11): 32-41.doi: 10.16265/j.cnki.issn1003-3033.2025.11.0357

• Safety social science and safety management • Previous Articles     Next Articles

Unsafe behavior recognition of miners in coal mine belt area based on multimodal feature fusion

HAO Qinxia(), ZHEN Haolong   

  1. School of Communication and Information Engineering, Xi'an University of Science and Technology, Xi'an Shaanxi 710054, China
  • Received:2025-06-07 Revised:2025-09-10 Online:2025-11-28 Published:2026-05-28

Abstract:

To enhance the accuracy and robustness of miner behavior recognition in underground belt conveyor areas and reduce the occurrence of unsafe behaviors, a multimodal feature fusion-based recognition method was proposed. The method integrated RGB and skeletal modalities, where an improved Efficient Attention(EA)-SlowFast networkwas employed to extract RGB features, and an improved Mobile-Enhanced(ME)-YOLO was utilized for miner detection. The detected miners were subsequently processed by Lite-HRNet for pose estimation to extract skeletal keypoints, which were further encoded using an improved Depthwise Convolution and Channel Prior Attention Spatio-Temporal Graph Convolutional Network (DC-STGCN) to obtain skeletal features. Finally, late fusion of RGB and skeletal features was performed for unsafe behavior recognition. In the experimental design, both public datasets (Human Motion Database 51(HMDB51),University of Central Florida 101 Action Recognition Dataset(UCF101)) and a self-constructed underground coal mine unsafe behavior dataset were adopted for validation. The results demonstrate that, compared with single-modal baseline models, EA-SlowFast and DC-STGCN achieve recognition accuracies of 71.6% and 68.3% on HMDB51, and 88.3% and 85.4% on UCF101, respectively. The multimodal fusion model outperforms the unimproved fusion baseline, achieving 75.4% and 93.5% on HMDB51 and UCF101, respectively. On the self-constructed dataset, ME--YOLO attains a Mean Average Precision(mAP)@0.5 of 91.8% with an inference speed of 38 f/s, satisfying real-time requirements. The fusion model further achieves an accuracy of 90.6%, confirming the effectiveness of the proposed approach in complex underground belt conveyor environments.

Key words: multimodal, feature fusion, coal mine belt area, unsafe behaviors of miners, object detection, red-green-blue(RGB) modality, skeleton modality

CLC Number: