中国安全科学学报 ›› 2025, Vol. 35 ›› Issue (11): 32-41.doi: 10.16265/j.cnki.issn1003-3033.2025.11.0357

• 安全社会科学与安全管理 • 上一篇    下一篇

基于多模态融合的煤矿输送带运输区矿工不安全行为识别

郝秦霞(), 甄浩龙   

  1. 西安科技大学 通信与信息工程学院,陕西 西安 710054
  • 收稿日期:2025-06-07 修回日期:2025-09-10 出版日期:2025-11-28
  • 作者简介:

    郝秦霞 副教授 (1980—),女,陕西西安人,博士,副教授,主要从事物联网工程应用,煤矿安全方面的研究。E-mail:

  • 基金资助:
    陕西省重点研发计划(2024GX-YBXM-526)

Unsafe behavior recognition of miners in coal mine belt area based on multimodal feature fusion

HAO Qinxia(), ZHEN Haolong   

  1. School of Communication and Information Engineering, Xi'an University of Science and Technology, Xi'an Shaanxi 710054, China
  • Received:2025-06-07 Revised:2025-09-10 Published:2025-11-28

摘要: 为提高煤矿输送带运输区矿工行为识别的准确性和鲁棒性,降低不安全行为发生概率,提出一种基于多模态特征融合的不安全行为识别方法。首先,结合红绿蓝(RGB)与骨骼2种模态,通过基于高效注意力机制(EA)改进的SlowFast网络(EA-SlowFast)提取RGB模态特征,并利用改进后的移动增强型(ME)YOLO(ME-YOLO)检测矿工目标;然后,对检测到的矿工送入Lite-HRNet进行姿态估计,提取骨骼关键点数据,再通过深度通道卷积注意力时空图卷积网络(DC-STGCN)提取骨骼模态特征,实现RGB模态与骨骼模态的晚期融合;最后,分别选取公开数据集(人类动作数据库51(HMDB51)、中佛罗里达大学101类动作识别数据集(UCF101))与自建煤矿井下不安全行为数据集进行试验验证。结果表明:相较于单一模态基准模型,EA-SlowFast与DC-STGCN在HMDB51数据集上的识别准确率分别为71.6%和68.3%,在UCF101动作识别数据集上的识别准确率分别为88.3%和85.4%;多模态特征融合模型的识别准确率较未改进的融合模型有所提升,在HMDB51和UCF101数据集上分别达到75.4%和93.5%。在自建数据集上,ME-YOLO的平均检测精度(mAP)@0.5为91.8%,推理速度为38帧/s,满足实时性需求。融合模型的识别准确率达到90.6%,验证了该方法在煤矿输送带运输区环境中的有效性。

关键词: 多模态, 特征融合, 煤矿输送带运输区, 矿工不安全行为, 目标检测, 红绿蓝(RGB)模态, 骨骼模态

Abstract:

To enhance the accuracy and robustness of miner behavior recognition in underground belt conveyor areas and reduce the occurrence of unsafe behaviors, a multimodal feature fusion-based recognition method was proposed. The method integrated RGB and skeletal modalities, where an improved Efficient Attention(EA)-SlowFast networkwas employed to extract RGB features, and an improved Mobile-Enhanced(ME)-YOLO was utilized for miner detection. The detected miners were subsequently processed by Lite-HRNet for pose estimation to extract skeletal keypoints, which were further encoded using an improved Depthwise Convolution and Channel Prior Attention Spatio-Temporal Graph Convolutional Network (DC-STGCN) to obtain skeletal features. Finally, late fusion of RGB and skeletal features was performed for unsafe behavior recognition. In the experimental design, both public datasets (Human Motion Database 51(HMDB51),University of Central Florida 101 Action Recognition Dataset(UCF101)) and a self-constructed underground coal mine unsafe behavior dataset were adopted for validation. The results demonstrate that, compared with single-modal baseline models, EA-SlowFast and DC-STGCN achieve recognition accuracies of 71.6% and 68.3% on HMDB51, and 88.3% and 85.4% on UCF101, respectively. The multimodal fusion model outperforms the unimproved fusion baseline, achieving 75.4% and 93.5% on HMDB51 and UCF101, respectively. On the self-constructed dataset, ME--YOLO attains a Mean Average Precision(mAP)@0.5 of 91.8% with an inference speed of 38 f/s, satisfying real-time requirements. The fusion model further achieves an accuracy of 90.6%, confirming the effectiveness of the proposed approach in complex underground belt conveyor environments.

Key words: multimodal, feature fusion, coal mine belt area, unsafe behaviors of miners, object detection, red-green-blue(RGB) modality, skeleton modality

中图分类号: