中国安全科学学报 ›› 2026, Vol. 36 ›› Issue (3): 104-112.doi: 10.16265/j.cnki.issn1003-3033.2026.03.0881

• 安全技术与工程 • 上一篇    下一篇

水电工程施工安全隐患图文多模态智能识别方法*

聂本武1,2,3(), 陈述1,3,**(), 陈云1,3, 田雪琪2, 曹坤煜1, 李智4   

  1. 1 三峡大学 水电工程施工与管理湖北省重点实验室, 湖北 宜昌 443002
    2 国家能源投资集团有限责任公司 金沙江分公司, 四川 成都 610041
    3 三峡大学 水利与环境学院, 湖北 宜昌 443002
    4 中国长江三峡集团有限公司, 湖北 武汉 430000
  • 收稿日期:2025-09-14 修回日期:2025-12-11 出版日期:2026-03-31
  • 通信作者:
    ** 陈述(1986—),男,湖北英山人,博士,教授,主要从事水电工程施工安全管理方面的研究。E-mail:
  • 作者简介:

    聂本武 (1987—),男,湖北孝感人,博士研究生,高级工程师,主要研究方向为水电工程智能建造。E-mail:。陈

    陈 云,副教授。

    李 智,正高级工程师。

  • 基金资助:
    国家自然科学基金资助(52479127); 国家自然科学基金资助(52209163); 湖北省自然科学基金青年A类项目资助(2025AFA074); 湖北省自然科学基金创新发展联合基金资助(2024AFD153)

An image-text multimodal intelligent identification method for construction safety hazards in hydropower engineering

NIE Benwu1,2,3(), CHEN Shu1,3,**(), CHEN Yun1,3, TIAN Xueqi2, CAO Kunyu1, LI Zhi4   

  1. 1 Hubei Key Laboratory of Construction and Management in Hydropower Engineering, China Three Gorges University, Yichang Hubei 443002, China
    2 Jinshajiang Branch, China Energy Investment Corporation, Chengdu Sichuan 610041, China
    3 College of Hydraulic & Environmental Engineering, China Three Gorges University, Yichang Hubei 443002, China
    4 China Three Gorges Corporation, Wuhan Hubei 430000, China
  • Received:2025-09-14 Revised:2025-12-11 Published:2026-03-31

摘要:

为解决水电工程施工安全隐患识别存在单模态特征表达不完整、图文特征融合效率低等问题,提出水电工程施工安全隐患图文多模态智能识别方法。首先,针对水电工程施工特点,定义12种施工安全隐患分类特征,建立水电工程施工安全隐患图文多模态数据集;其次,利用双向变换器模型(BERT)和视觉变换器(ViT)模型分别提取隐患文本与图像特征,引入门控融合网络(GFN),动态调节图文特征贡献度,捕捉多模态关联特征信息,通过多层感知器提高对多模态分类识别精度;最后,通过对比试验,检验模型识别的准确性与可靠性。结果表明:该方法通过增强识别稳定性,实现对多模态隐患特征的优化贡献,多模态隐患识别准确率高达84.99%,较文本模型提升1.73%,较图像模型提升12.24%,隐患识别分类优于已有基准模型,有助于提升安全隐患智能识别的鲁棒性。

关键词: 水电工程, 施工安全隐患, 多模态, 门控融合网络(GFN), 智能识别

Abstract:

To address the problems of incomplete unimodal feature representation and low image-text fusion efficiency in construction safety hazard identification for hydropower projects, an intelligent image-text multimodal intelligent identification method was proposed. First, 12 categories of construction safety hazards were defined according to hydropower construction characteristics, and an image-text multimodal dataset was established. Second, bidirectional encoder representations from transformers (BERT) model and vision transformer (ViT) model were employed to extract hazard text and image features respectively. GFN was then introduced to dynamically adjust the contribution of image and text features and capture cross-modal correlated feature information, while a multi-layer perceptron was used to improve classification accuracy. Comparative experiments were conducted to verify the model's accuracy and reliability. The results show the method optimizes the contribution of multimodal features by enhancing identification stability. The multimodal hazard identification accuracy reaches 84.99%, representing an improvement of 1.73% over the text-based model and 12.24% over the image-based model.. The proposed approach outperforms existing benchmark models in hazard classification and improves the robustness of intelligent hazard identification.

Key words: hydropower project, construction safety hazard, multimodal, gated fusion network(GFN), intelligent identification

中图分类号: