中国安全科学学报 ›› 2025, Vol. 35 ›› Issue (3): 115-124.doi: 10.16265/j.cnki.issn1003-3033.2025.03.0863

• 安全工程技术 • 上一篇    下一篇

轻量化神经网络结合深度相机的矿工目标检测与定位

张淼1,2(), 王孝军1,2, 雷经发1,2,3,**(), 赵汝海1,2, 李永玲1,2   

  1. 1 安徽建筑大学 机械与电气工程学院,安徽 合肥 230601
    2 工程机械智能制造安徽省教育厅重点实验室,安徽 合肥 230601
    3 过程装备与控制工程四川省高校重点实验室,四川 自贡 643000
  • 收稿日期:2024-10-23 修回日期:2024-12-25 出版日期:2025-03-28
  • 通信作者:
    ** 雷经发(1978—),男,安徽巢湖人,博士,教授,主要从事视觉检测、人因工程等方面的研究。E-mail:
  • 作者简介:

    张 淼 (1986—),男,河南项城人,博士,讲师,主要从事机器视觉、人机工程等方面的研究。E-mail:

    赵汝海,副教授;

    李永玲,讲师

  • 基金资助:
    安徽高校自然科学研究重大项目(KJ2021ZD0068); 安徽高校协同创新项目(GXXT2022-019); 过程装备与控制工程四川省高校重点实验室开放基金资助(GK202308)

Lightweight neural network combined with depth camera for miner target detection and localization

ZHANG Miao1,2(), WANG Xiaojun1,2, LEI Jingfa1,2,3,**(), ZHAO Ruhai1,2, LI Yongling1,2   

  1. 1 School of Mechanical and Electrical Engineering, Anhui Jianzhu University, Hefei Anhui 230601, China
    2 Key Laboratory of Intelligent Manufacturing of Construction Machinery, Anhui Education Department, Hefei Anhui 230601, China
    3 Sichuan Provincial Key Laboratory of Process Equipment and Control Engineering, Zigong Sichuan 643000, China
  • Received:2024-10-23 Revised:2024-12-25 Published:2025-03-28

摘要:

为防止矿工误入危险区域,提出一种YOLOv5s-MPD轻量化井下矿工目标检测模型,并结合深度相机定位矿工目标,实时检测矿工是否进入危险区域。首先,使用MobileNetv3轻量化神经网络作为主干特征提取网络,大幅降低模型体积;其次,引入极化自注意力模块(PSA),增强目标的感知能力;最后,采用可变形卷积网络(DCNv2)替代特征融合层中C3模块的标准卷积,解决常规卷积丢失部分特征信息的问题,利用改进模型结合深度相机获取的彩色图像检测矿工目标,并得到目标中心点的空间三维坐标。结果表明:改进模型相比于YOLOv5s,参数量和计算量分别减少83.54%和77.03%,模型体积大小仅为3.4 MB,检测速度为70.2帧/s,提升54.97%,平均精度均值(mAP)为0.825。与主流目标检测模型相比,改进模型的参数量、计算量、模型体积、检测速度和mAP较为均衡。在实际定位精度试验中,1~8 m范围内测得相机与矿工目标间距离的平均绝对误差和平均相对误差分别为0.11 m和1.74%;最大绝对误差和最大相对误差分别为0.25 m和2.96%。在动态检测中,均能检测到矿工目标并输出其位置信息,检测成功率达97.5%。

关键词: 轻量化, 神经网络, 深度相机, 目标检测, 目标定位, 安全预警

Abstract:

To prevent miners from mistakenly entering dangerous areas, a lightweight underground miner object detection model based on YOLOv5s-MPD was proposed, which combined with depth camera to locate miner targets and detect whether miners had entered dangerous areas in real time. Specifically, the MobileNetv3 lightweight neural network was used as the backbone feature extraction network to significantly reduce the model size. Secondly, Polarized Self-Attention (PSA) module was introduced to enhance the perception of targets. Finally, Deformable Convolution Network v2 (DCNv2) was used to replace the standard convolution in the C3 module of the feature fusion layer, solving the problem of partial feature information loss in conventional convolution. The improved model was used in combination with the color images obtained by the depth camera to detect miner targets and obtain the spatial three-dimensional coordinates of the target center points. The results show that compared with YOLOv5s, the improved model reduces the number of parameters and computation by 83.54% and 77.03%, respectively. The model size is only 3.4 MB, and a detection speed of 70.2 f/s, which is increased by 54.97%. The mean average precision is 0.825. Compared with mainstream object detection models, the improved model has a more balanced number of parameters, computation, model size, detection speed, and mean average precision. In the actual positioning accuracy test, within a range of 1-8 meters, the average absolute error and average relative error of the distance between the camera and the miner target were 0.11 meters and 1.74%, respectively. The maximum absolute error and maximum relative error were 0.25 meters and 2.96%, respectively. In the dynamic detection, the miner target could be detected and its location information output, with a detection success rate of 97.5%.

Key words: lightweight, neural network, deep camera, target detection, target localization, security warning

中图分类号: