水电工程施工安全隐患图文多模态智能识别方法

doi:10.16265/j.cnki.issn1003-3033.2026.03.0881

中国安全科学学报 ›› 2026, Vol. 36 ›› Issue (3): 104-112.doi: 10.16265/j.cnki.issn1003-3033.2026.03.0881

水电工程施工安全隐患图文多模态智能识别方法

聂本武¹^,²^,³(), 陈述¹^,³^,^**(), 陈云¹^,³, 田雪琪², 曹坤煜¹, 李智⁴

¹ 三峡大学水电工程施工与管理湖北省重点实验室, 湖北宜昌 443002
² 国家能源投资集团有限责任公司金沙江分公司, 四川成都 610041
³ 三峡大学水利与环境学院, 湖北宜昌 443002
⁴ 中国长江三峡集团有限公司, 湖北武汉 430000

收稿日期:2025-09-14 修回日期:2025-12-11 出版日期:2026-03-28
通信作者:
^** 陈述(1986—),男,湖北英山人,博士,教授,主要从事水电工程施工安全管理方面的研究。E-mail:chenshu@ctgu.edu.cn。
作者简介:
聂本武 (1987—),男,湖北孝感人,博士研究生,高级工程师,主要研究方向为水电工程智能建造。E-mail:nbwpdp@sina.com。陈
陈云,副教授。
李智,正高级工程师。
基金资助:
国家自然科学基金资助(52479127); 国家自然科学基金资助(52209163); 湖北省自然科学基金青年A类项目资助(2025AFA074); 湖北省自然科学基金创新发展联合基金资助(2024AFD153)

An image-text multimodal intelligent identification method for construction safety hazards in hydropower engineering

NIE Benwu¹^,²^,³(), CHEN Shu¹^,³^,^**(), CHEN Yun¹^,³, TIAN Xueqi², CAO Kunyu¹, LI Zhi⁴

¹ Hubei Key Laboratory of Construction and Management in Hydropower Engineering, China Three Gorges University, Yichang Hubei 443002, China
² Jinshajiang Branch, China Energy Investment Corporation, Chengdu Sichuan 610041, China
³ College of Hydraulic & Environmental Engineering, China Three Gorges University, Yichang Hubei 443002, China
⁴ China Three Gorges Corporation, Wuhan Hubei 430000, China

Received:2025-09-14 Revised:2025-12-11 Published:2026-03-28

摘要/Abstract

摘要：

为解决水电工程施工安全隐患识别存在单模态特征表达不完整、图文特征融合效率低等问题,提出水电工程施工安全隐患图文多模态智能识别方法。首先,针对水电工程施工特点,定义12种施工安全隐患分类特征,建立水电工程施工安全隐患图文多模态数据集;其次,利用双向变换器模型(BERT)和视觉变换器(ViT)模型分别提取隐患文本与图像特征,引入门控融合网络(GFN),动态调节图文特征贡献度,捕捉多模态关联特征信息,通过多层感知器提高对多模态分类识别精度;最后,通过对比试验,检验模型识别的准确性与可靠性。结果表明:该方法通过增强识别稳定性,实现对多模态隐患特征的优化贡献,多模态隐患识别准确率高达84.99%,较文本模型提升1.73%,较图像模型提升12.24%,隐患识别分类优于已有基准模型,有助于提升安全隐患智能识别的鲁棒性。

关键词: 水电工程, 施工安全隐患, 多模态, 门控融合网络(GFN), 智能识别

Abstract:

To address the problems of incomplete unimodal feature representation and low image-text fusion efficiency in construction safety hazard identification for hydropower projects, an intelligent image-text multimodal intelligent identification method was proposed. First, 12 categories of construction safety hazards were defined according to hydropower construction characteristics, and an image-text multimodal dataset was established. Second, bidirectional encoder representations from transformers (BERT) model and vision transformer (ViT) model were employed to extract hazard text and image features respectively. GFN was then introduced to dynamically adjust the contribution of image and text features and capture cross-modal correlated feature information, while a multi-layer perceptron was used to improve classification accuracy. Comparative experiments were conducted to verify the model's accuracy and reliability. The results show the method optimizes the contribution of multimodal features by enhancing identification stability. The multimodal hazard identification accuracy reaches 84.99%, representing an improvement of 1.73% over the text-based model and 12.24% over the image-based model.. The proposed approach outperforms existing benchmark models in hazard classification and improves the robustness of intelligent hazard identification.

Key words: hydropower project, construction safety hazard, multimodal, gated fusion network(GFN), intelligent identification

中图分类号:

X948

聂本武, 陈述, 陈云, 田雪琪, 曹坤煜, 李智. 水电工程施工安全隐患图文多模态智能识别方法[J]. 中国安全科学学报, 2026, 36(3): 104-112.

NIE Benwu, CHEN Shu, CHEN Yun, TIAN Xueqi, CAO Kunyu, LI Zhi. An image-text multimodal intelligent identification method for construction safety hazards in hydropower engineering[J]. China Safety Science Journal, 2026, 36(3): 104-112.

图/表 11

表1

表2

图1

图2

表3

表4

图3

表5

表6

图4

表7

参考文献 16

[1]	卢冰, 陈述, 曹坤煜, 等. 水电工程施工安全隐患类别辅助校正方法[J]. 水力发电学报, 2025, 44(4): 42-49.
	LU Bing, CHEN Shu, CAO Kunyu, et al. Auxiliary correction methods for categories of potential safety hazards in hydropower project construction[J]. Journal of Hydroelectric Engineering, 2025, 44(4): 42-49.
[2]	陈述, 王典学, 杨应柳, 等. 水电工程施工安全隐患语义匹配模型[J]. 中国安全科学学报, 2024, 34(12): 40-47. doi: 10.16265/j.cnki.issn1003-3033.2024.12.0795
	CHEN Shu, WANG Dianxue, YANG Yingliu, et al. Semantic matching model of potential safety hazards in hydroelectric project construction[J]. China Safety Science Journal, 2024, 34(12): 40-47. doi: 10.16265/j.cnki.issn1003-3033.2024.12.0795
[3]	杨阳蕊, 潘世峰, 刘雪梅, 等. 多模态知识图谱与大模型协同的水利工程风险应对决策推荐[J]. 水利学报, 2025, 56(4): 519-530.
	YANG Yangrui, PAN Shifeng, LIU Xuemei, et al. Multimodal knowledge graph collaborated with large model fordecision recommendation of water projects risk response[J]. Journal of Hydraulic Engineering, 2025, 56(4): 519-530.
[4]	张泽辉, 张乾隆, 徐晓滨, 等. 基于视觉的工人高处攀爬不安全行为识别模型[J]. 中国安全科学学报, 2025, 35(2): 144-151. doi: 10.16265/j.cnki.issn1003-3033.2025.02.0278
	ZHANG Zehui, ZHANG Qianlong, XU Xiaobin, et al. Unsafe behavior recognition model of high climbing workers based on vision[J]. China Safety Science Journal, 2025, 35(2): 144-151. doi: 10.16265/j.cnki.issn1003-3033.2025.02.0278
[5]	ZHONG Botao, SHEN Luoxin, PAN Xing, et al. Visual attention framework for identifying semantic information from construction monitoring video[J]. Safety Science, 2023, 163:DOI:10.1016/J.SSCI.2023.106122.
[6]	FANG Weili, LOVE P E D, DING Lieyun, et al. Computer vision and deep learning to manage safety in construction: matching images of unsafe behavior and semantic rules[J]. IEEE Transactions on Engineering Management, 2023, 70:4 120-4 132. doi: 10.1109/TEM.2021.3093166
[7]	陈述, 张超, 陈云, 等. 基于命名实体识别的水电工程施工安全规范实体识别模型[J]. 中国安全科学学报, 2024, 34(9): 19-26. doi: 10.16265/j.cnki.issn1003-3033.2024.09.0008
	CHEN Shu, ZHANG Chao, CHEN Yun, et al. Model of identifying entities of safety specification for hydropower engineering construction[J]. China Safety Science Journal, 2024, 34(9): 19-26. doi: 10.16265/j.cnki.issn1003-3033.2024.09.0008
[8]	周佳一, 郑霞忠, 田丹, 等. 水电工程施工安全隐患多标签文本智能分类方法[J]. 水力发电学报, 2024, 43(11): 114-124.
	ZHOU Jiayi, ZHENG Xiazhong, TIAN Dan, et al. Multi-label text intelligent classification method for construction safety hazards of hydropower projects[J]. Journal of Hydroelectric Engineering, 2024, 43(11): 114-124.
[9]	LIU Jiajing, FANG Weili, PETER E D L, et al. Detection and location of unsafe behaviour in digital images: a visual grounding approach[J]. Advanced Engineering Informatics, 2022, 53:DOI:10.1016/j.aei.2022.101688.
[10]	田丹, 许仁乐, 邵波, 等. 强化特征表达的水电工程施工安全隐患自动辨识方法[J/OL]. 河海大学学报:自然科学版: 1-11.[2025-05-28]. https://kns.cnki.net/kcms2/article.html.
	TIAN Dan, XU Renle, SHAO Bo, et al. Automatic identification method for construction safety hazards in hydropower engineering with enhanced feature expression[J/OL]. Journal of Hohai University: Natural Sciences Edition: 1-11.[2025-05-28]. https://kns.cnki.net/kcms2/article.html.
[11]	陈岩松, 张乐, 张雷瀚, 等. 基于跨模态注意力和门控单元融合网络的多模态情感分析方法[J]. 数据分析与知识发现, 2024, 8(7): 67-76. doi: 10.11925/infotech.2096-3467.2023.0591
	CHEN Yansong, ZHANG Le, ZHANG Leihan, et al. Multimodal sentiment analysis method based on cross-modal attention and gated unit fusion network[J]. Data Analysis and Knowledge Discovery, 2024, 8(7): 67-76. doi: 10.11925/infotech.2096-3467.2023.0591
[12]	WANG Yiheng, XIAO Bo, BOUFERGUENE A, et al. Proactive safety hazard identification using visual-text semantic similarity for construction safety management[J]. Automation in Construction, 2024, 166:DOI:10.1016/j.autcon.2024.105602.
[13]	KIM H, YI J S. Image generation of hazardous situations in construction sites using text-to-image generative model for training deep neural networks[J]. Automation in Construction, 2024, 166:DOI:10.1016/j.autcon.2024.105615.
[14]	AREVALO J, SOLORIO T, MONTES M, et al. Gated multimodal networks[J]. Neural Computing and Applications, 2020, 32(14): 10 209-10 228. doi: 10.1007/s00521-019-04559-1
[15]	CAO Kunyu, CHEN Shu, CHEN Yun, et al. Decision analysis of safety risks pre-control measures for falling accidents in mega hydropower engineering driven by accident case texts[J]. Reliability Engineering & System Safety, 2025,261: DOI:10.1016/j.ress.2025.111120.
[16]	TIAN Dan, LIU Hao, CHEN Shu, et al. Human error analysis for hydraulic engineering: comprehensive system to reveal accident evolution process with text knowledge[J]. Journal of Construction Engineering and Management, 2022, 148(9): DOI:10.1061/(ASCE)CO.1943-7862.0002366.

隐患类型	隐患特征
高处坠落	存在从坠落高度基准面2 m以上(含2 m)处坠落的隐患
触电事故	电流有通过人体造成伤害或死亡的隐患
物体打击	失控物体存在(如工具、材料)击中人体造成伤害的隐患
起重伤害	存在因吊具、吊物等引发的伤害的隐患
机械伤害	存在机械设备运动部件(如齿轮、皮带)导致的夹击、切割等伤害的隐患
火灾事故	易燃物堆积、电气短路或明火管理不当等发生燃烧的隐患
车辆伤害	存在机动车辆(如自卸车、挖掘机)碰撞、碾压导致伤害的隐患
爆炸事故	存在压力容器、危险化学品等因瞬间能量释放造成破坏的隐患
坍塌事故	存在土方、模板或建筑结构塌落的隐患
设备损坏	生产设备非正常损毁和机械故障、超负荷运行或维护不足引发连锁的隐患
文明施工	因现场管理混乱(如杂物堆放、通道堵塞)等脏乱差导致的间接隐患
其他事故	未涵盖在上述分类中的其他风险或隐患

序号	图片描述	隐患描述	隐患类型	隐患照片
1	挖掘机的机械臂停放在空中	挖掘机机械臂在停止作业时未放置地面	起重伤害
2	一个打开的电源箱, 未作接地连接	二级配电箱一闸多机且电缆出线未作接地连接	触电事故
3	一个没有连接扣件的脚手架	边坡支护排架剪刀撑钢管与立杆相交处未使用扣件连接	坍塌事故
︙	︙	︙	︙	︙
2282	高处作业人员未系挂安全带	排导槽上游高处作业人员未系挂安全带	高处坠落

隐患类型	训练集	验证集	测试集	总数	占比/%	隐患类型	训练集	验证集	测试集	总数	占比/%
爆炸事故	57	16	8	81	4	起重伤害	85	24	12	121	5
车辆伤害	15	4	2	22	1	设备损坏	20	6	3	29	1
触电事故	505	144	72	721	32	坍塌事故	41	12	6	59	3
高处坠落	192	55	27	274	12	文明施工	246	70	35	352	15
火灾事故	205	59	29	293	13	物体打击	44	13	6	63	3
机械伤害	41	12	6	58	3	正常照片	700	200	100	1 000	—
其他事故	146	42	21	209	9	合计	2 297	656	328	3 282	100

模型名称	输入类型	优点	缺点	适用领域
BERT^[7]	文本	双向上下文预训练	计算资源消耗大、生成能力弱	文本分类、命名实体识别、阅读理解
ViT^[9]	图像	序列化+全局自注意力机制	计算复杂度高	大规模图像分类/检测
BERT-ViT^[12]	图像-文本	跨模态交互(交叉注意力机制/特征融合)	训练复杂、数据稀缺、模态不平衡	图文交互任务

模型	准确率	精确率	召回率	F₁值
文本模型	84.49	84.43	69.96	84.21
图像模型	76.58	76.82	57.61	75.32
图文模型	85.30	87.92	70.44	84.45
门控融合图文模型	85.95	85.98	71.01	84.99

水电工程施工安全隐患图文多模态智能识别方法

An image-text multimodal intelligent identification method for construction safety hazards in hydropower engineering

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 11

参考文献 16

相关文章 15

编辑推荐

Metrics

本文评价

隐患类型	精确率	召回率	F₁值
其他事故	0.600 0	0.250 0	0.352 9
坍塌事故	0.307 7	0.307 7	0.307 7
文明施工	0.685 5	0.885 4	0.772 7
机械伤害	0.875 0	0.875 0	0.875 0
火灾事故	0.913 8	0.929 8	0.921 7
爆炸事故	0.875 0	0.875 0	0.875 0
物体打击	1.000 0	0.428 6	0.600 0
触电事故	0.905 1	0.925 4	0.915 1
设备损坏	1.000 0	0.571 4	0.727 3
起重伤害	0.892 9	0.925 9	0.909 1
车辆伤害	0.666 7	0.400 0	0.500 0
高处坠落	0.857 1	0.857 1	0.857 1

隐患类型	文本模型	图像模型	图文模型	门控融合图文模型
起重伤害	0.925 9	0.763 3	0.924 0	0.909 3
触电事故	0.908 6	0.834 7	0.909 2	0.915 2
火灾事故	0.891 7	0.875 2	0.920 4	0.921 8
机械伤害	0.875 0	0.802 4	0.936 1	0.875 0
爆炸事故	0.774 7	0.606 6	0.833 6	0.875 0
文明施工	0.776 3	0.722 8	0.757 2	0.781 2
设备损坏	0.766 2	0.676 2	0.766 2	0.766 2
高处坠落	0.845 4	0.563 7	0.850 2	0.857 1
车辆伤害	0.783 3	0.522 2	0.600 0	0.522 2
物体打击	0.449 2	0.338 1	0.619 0	0.676 2
其他事故	0.395 5	0.335 2	0.483 7	0.401 0
坍塌事故	0.198 1	0.278 9	0.444 6	0.307 7

[1]	曹海青, 姚志英, 吕淑然, 姚翠友. 融合注意力机制的LSTM职工心理压力状态评价方法[J]. 中国安全科学学报, 2026, 36(3): 229-237.
[2]	安思齐, 蔡昂林, 马子程, 朱宝岩. 集成多模态大模型的施工安全隐患识别[J]. 中国安全科学学报, 2025, 35(9): 185-192.
[3]	王海泉, 于浩玮, 杨岳毅, 徐晓滨, 卜祥洲, KURKOVA P. 工业场景下人员行为的多模态信息融合决策策略[J]. 中国安全科学学报, 2025, 35(8): 84-92.
[4]	杨奉展, 顾清华, 李少博, 杨建春. 露天矿低能见度下多模态融合障碍物检测[J]. 中国安全科学学报, 2025, 35(5): 195-203.
[5]	郝秦霞, 甄浩龙. 基于多模态融合的煤矿输送带运输区矿工不安全行为识别[J]. 中国安全科学学报, 2025, 35(11): 32-41.
[6]	王喆, 黄海辰, 李瑞钦, 魏永长. 基于视觉语言多模态的建筑施工安全智能问答模型[J]. 中国安全科学学报, 2025, 35(10): 106-114.
[7]	姜垣良, 任庆滢, 任远, 刘海鹏, 董绍华. 基于改进YOLO模型的中缅油气管道遥感图像高后果区识别方法[J]. 中国安全科学学报, 2025, 35(1): 103-111.
[8]	陈述, 张超, 陈云, 张光飞, 李智. 基于命名实体识别的水电工程施工安全规范实体识别模型[J]. 中国安全科学学报, 2024, 34(9): 19-26.
[9]	江松, 李研博, 何旭乾, 何润丰, 张超, 张存良. 基于无人机影像深度学习的滑坡灾害智能识别[J]. 中国安全科学学报, 2024, 34(7): 229-238.
[10]	晋良海, 王抒情, 王昕煜. 基于暴雨灾害短视频的多模态情感特征研究[J]. 中国安全科学学报, 2024, 34(7): 219-228.
[11]	郑霞忠, 刘奕成, 邵波, 王硕, 柯善钢. 基于文本挖掘的水电工程施工物体打击事故致因分析[J]. 中国安全科学学报, 2024, 34(4): 50-57.
[12]	陈述, 王典学, 杨应柳, 曹坤煜, 聂本武. 水电工程施工安全隐患语义匹配模型[J]. 中国安全科学学报, 2024, 34(12): 40-47.
[13]	周诗杰, 刘东海, 金睿. 水电工程施工安全事故致因重要性多证据融合综合评价[J]. 中国安全科学学报, 2023, 33(3): 103-110.
[14]	陈述, 孙孟文, 陈云, 聂本武, 李智, 刘文濯. 基于无监督LDA的水电工程施工安全事故致因分析[J]. 中国安全科学学报, 2023, 33(10): 79-85.
[15]	段斌, 何加平, 覃事河, 严思源, 陈志超. 基于GB-InSAR技术的水电工程高边坡变形监测[J]. 中国安全科学学报, 2022, 32(S2): 64-69.