China Safety Science Journal ›› 2026, Vol. 36 ›› Issue (3): 104-112.doi: 10.16265/j.cnki.issn1003-3033.2026.03.0881

• Safety Technology and Engineering • Previous Articles     Next Articles

An image-text multimodal intelligent identification method for construction safety hazards in hydropower engineering

NIE Benwu1,2,3(), CHEN Shu1,3,**(), CHEN Yun1,3, TIAN Xueqi2, CAO Kunyu1, LI Zhi4   

  1. 1 Hubei Key Laboratory of Construction and Management in Hydropower Engineering, China Three Gorges University, Yichang Hubei 443002, China
    2 Jinshajiang Branch, China Energy Investment Corporation, Chengdu Sichuan 610041, China
    3 College of Hydraulic & Environmental Engineering, China Three Gorges University, Yichang Hubei 443002, China
    4 China Three Gorges Corporation, Wuhan Hubei 430000, China
  • Received:2025-09-14 Revised:2025-12-11 Online:2026-03-31 Published:2026-09-28
  • Contact: CHEN Shu

Abstract:

To address the problems of incomplete unimodal feature representation and low image-text fusion efficiency in construction safety hazard identification for hydropower projects, an intelligent image-text multimodal intelligent identification method was proposed. First, 12 categories of construction safety hazards were defined according to hydropower construction characteristics, and an image-text multimodal dataset was established. Second, bidirectional encoder representations from transformers (BERT) model and vision transformer (ViT) model were employed to extract hazard text and image features respectively. GFN was then introduced to dynamically adjust the contribution of image and text features and capture cross-modal correlated feature information, while a multi-layer perceptron was used to improve classification accuracy. Comparative experiments were conducted to verify the model's accuracy and reliability. The results show the method optimizes the contribution of multimodal features by enhancing identification stability. The multimodal hazard identification accuracy reaches 84.99%, representing an improvement of 1.73% over the text-based model and 12.24% over the image-based model.. The proposed approach outperforms existing benchmark models in hazard classification and improves the robustness of intelligent hazard identification.

Key words: hydropower project, construction safety hazard, multimodal, gated fusion network(GFN), intelligent identification

CLC Number: