基于知识注入的燃气知识双向变换器模型

doi:10.16265/j.cnki.issn1003-3033.2025.03.0223

摘要/Abstract

摘要：

为提高燃气管网领域的应急管理水平,提出燃气知识双向变换器(Gas-kBERT)模型。该模型结合聊天生成预训练转换器(ChatGPT)扩充的燃气管网领域数据,以及构建的中文燃气语言理解-三元组(CGLU-Spo)和相关语料库,通过改变模型的掩码(MASK)机制,成功将领域知识注入模型中。考虑到燃气管网领域的专业性和特殊性,Gas-kBERT在不同规模和内容的语料库上进行预训练,并在燃气管网领域的命名实体识别和分类任务上进行微调。结果表明:与通用的双向变换器(BERT)模型相比,Gas-kBERT在燃气管网领域的文本挖掘任务中F₁值表现出显著的提升。在命名实体识别任务中,F₁值提高29.55%;在文本分类任务中,F₁值提升高达83.33%。由此证明Gas-kBERT模型在燃气管网领域的文本挖掘任务中具有出色的表现。

关键词: 燃气管网, 燃气知识双向变换器(Gas-kBERT)模型, 自然语言处理(NLP), 知识注入, 双向变换器(BERT)模型

Abstract:

In order to enhance emergency management in the field of gas pipeline networks, Gas-kBERT model was proposed. The model incorporated data from the gas pipeline network field expanded by Chat Generative Pre-Trained Transformer,(ChatGPT)and Chinese Gas Language Understanding Subject-Predicate-Object(CGLU-Spo) and related corpora were constructed in this field. By altering the model's masking (MASK) mechanism, domain knowledge was successfully injected into the model. Considering the professionalism and specificity of the gas pipeline network field, Gas-kBERT was pre-trained on various scales and contents of corpora and fine-tuned on named entity recognition and classification tasks within this field. Experimental results demonstrated that, compared to the general BERT model, Gas-kBERT exhibited significant performance improvements in F₁-score in text mining tasks in the gas pipeline network field. Specifically, in the named entity recognition task, the F₁-score was increased by 29.55%, and in the text classification task, the F₁-score improvement reached up to 83.33%. This study proves that the Gas-kBERT model performs exceptionally well in text mining tasks in the gas pipeline network field.

Key words: gas pipeline networks, gas knowledge bidirectional encoder representations from transformers(Gas-kBERT)model, natural language processing(NLP), knowledge injection, bidirectional encoder representations from transformers (BERT)

中图分类号:

X913.4安全系统工程

柳晓昱, 庄育锋, 赵兴昊, 王珂璠, 张国开. 基于知识注入的燃气知识双向变换器模型[J]. 中国安全科学学报, 2025, 35(3): 204-211.

LIU Xiaoyu, ZHUANG Yufeng, ZHAO Xinghao, WANG Kefan, ZHANG Guokai. Gas knowledge bidirectional encoder representations from transformers model based on knowledge injection[J]. China Safety Science Journal, 2025, 35(3): 204-211.

图/表 9

图1

图2

图3

表1

图4

表2

表3

表4

表5

参考文献 19

[1]	MONTIEL H, VÍLCHEZ JA, ARNALDOS J, et al. Historical analysis of accidents in the transportation of natural gas[J]. Journal of Hazardous Materials, 1996, 51 (1/3):77-92.
[2]	郭秀军, 刘嘉柠. 智能化燃气管网监测与管理[J]. 网络安全和信息化, 2024(1):70-72.
[3]	李兰. 基于视觉检测与图像处理分析技术的燃气管道检测方法研究[D]. 南京: 东南大学, 2019.
	LI Lan. Study on the gas pipeline detection method based on visual inspection and image processing analysis technology[D]. Nanjing: Southeast University, 2019.
[4]	苟爽. 基于机器视觉的燃气管道内环境感知技术研究[D]. 昆明: 昆明理工大学, 2018.
	GOU Shuang. Research on internal environmental perception technology of gas pipeline based on machine vision[D]. Kunming: Kunming University of Science and Technology, 2018.
[5]	周吉祥, 张浩. 智能管道建设下的城镇燃气管道完整性管理[J]. 煤气与热力, 2021, 41(2):32-36,46.
	ZHOU Jixiang, ZHANG Hao. Integrity management of urban gas pipeline under construction of intelligent pipeline[J]. Gas & Heat, 2021, 41(2):32-36,46.
[6]	李思洁, 王亚慧, 张子豪. 燃气输配突发事件应急处置的知识图谱构建[J]. 消防科学与技术, 2022, 41(6):812-817.
	LI Sijie, WANG Yahui, ZHANG Zihao. Construction of knowledge graph for emergency disposal of gas transmission and distribution emergencies[J]. Fire Science and Technology, 2022, 41(6):812-817.
[7]	TIWARY U S, SIDDIQUI T. Natural language processing and information retrieval[M]. New York: Oxford University Press, 2008:656.
[8]	WANG Chenglong, JIANG Feijun, YANG Hongxia. A hybrid framework for text modeling with convolutional RNN[C]. The 23^rd ACM SIGKDD International Conference, 2017: DOI:10.1145/3097983.3098140.
[9]	QIN Ya, SHEN Guowei, ZHAO Wenbo, et al. A network security entity recognition method based on feature template and CNN-BiLSTM-recognition method based on feature template and CNN-BiLSTM-CRF[ J]. Frontiers of Information Technology and Electronic Engi-neering, 2019, 20(6):13-19.
[10]	RAO N V, KRISHNA B T. Automatic modulation recognition using machine learning techniques: a review[C]// KALYA S, KULKARNI M, SHIVAPRAKASHA K S. Advances in VLSI, Signal Processing, Power Electronics, IoT, Communication and Embedded Systems. Singapore: Springer Singapore, 2021: 145-154.
[11]	MA Zhiliang, CAI Shiyao, MAO Na, et al. Construction quality management based on a collaborative system using BIM and indoor positioning[J]. Automation in Construction, 2018, 92:35-45.
[12]	PETERS M, NEUMANN M, IYYER M, et al. Deep contextualized word representations[C]. The 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies,2018:2227-2237.
[13]	DEVLIN J, CHANG Mingwei, LEE Kenton, et al. BERT: pre-training of deep bidirectional transformers for language understanding[C]. Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019.DOI:10.48550/arXiv.1810.04805.
[14]	邓存彬, 虞慧群, 范贵生, 等. 燃气客服热线的中文文本情感分析[J]. 华东理工大学学报:自然科学版, 2019, 45(1):140-147.
	DENG Cunbin, YU Huiqun, FAN Guisheng, et al. Sentiment analysis of Chinese texts for gas customer service hotline[J]. Journal of East China University of Science and Technology:Natural Science Edition, 2019, 45(1):140-147.
[15]	王明达, 张榜, 吴志生, 等. 基于强化学习的城镇燃气事故信息抽取方法[J]. 中国安全生产科学技术, 2023, 19(3):39-45.
	WANG Mingda, ZHANG Bang, WU Zhisheng, et al. Information extraction method of urban gas accidents based on reinforcement learning[J]. Journal of Safety Science and Technology, 2023, 19(3):39-45.
[16]	LEE J H, YOON W J, KIM S D, et al. BioBERT:a pre-trained biomedical language representation model for biomedical text mining[J]. Bioinformatics, 2020, 36(4):1234-1240.
[17]	WANG Xuan, ZHANG Xiang, REN Yuhao, et al. Cross-type biomedical named entity recognition with deep multi-task learning[J]. Bioinformatics, 2019, 35(10):1745-1752. doi: 10.1093/bioinformatics/bty869 pmid: 30307536
[18]	宋敦江, 杨霖, 钟少波. 基于 BERT 的灾害三元组信息抽取优化研究[J]. 中国安全科学学报, 2022, 32(2) : 115-120. doi: 10.16265/j.cnki.issn1003-3033.2022.02.016
	SONG Dunjiang, YANG Lin, ZHONG Shaobo. Research on optimization of disaster triplet information extraction based on BERT[J]. China Safety Science Journal, 2022, 32(2): 115-120. doi: 10.16265/j.cnki.issn1003-3033.2022.02.016
[19]	郑彬彬, 冯婷婷, 王佳贺, 等. 基于文本挖掘的城镇燃气事故致因及关联分析[J]. 中国安全科学学报, 2023, 33(7):190-195. doi: 10.16265/j.cnki.issn1003-3033.2023.07.1644
	ZHENG Binbin, FENG Tingting, WANG Jiahe, et al. Causes and correlation analysis of urban gas accidents based on textmining[J]. China Safety Science Journal, 2023, 33(7): 190-195. doi: 10.16265/j.cnki.issn1003-3033.2023.07.1644

数据	大小 (原始文本)	大小 (ChatGPT 增强后)	领域
新闻语料(News_zh_2016)	—	1.6 G	通用领域
维基百科(Wiki_zh_2019)	—	1.2 G	通用领域
燃气事故报告(Gas Accident Report,GAR)	5.1M	20.3 M	燃气管网
燃气标准(Gas standards,GS)	3.4M	16.9 M	燃气管网
燃气新闻(Gas News,GN)、燃气应急预案(Gas Emergency Plan,GEP)等	2.7 M	13.7M	燃气管网

模型	语料组合
BERT	News+Wiki
Gas-kBERT(raw)	GAR+GS+GN+ GEP(原始文本)
Gas-kBERTv1.0(GAR+GS+ GN+GEP)	GAR+GS+GN+GEP (ChatGPT增强后文本)
Gas-kBERTv1.1(+GAR+ GS+GN+GEP)	News+Wiki+GAR+GS+GN+ GEP(ChatGPT增强后文本)

测试任务	数据	训练集	验证集	测试集
命名实体识别	GS	558,458	260,614	111,692
命名实体识别	事故报告	309,782	119,238	2,016
分类	应急预案	1,152	384	128
分类	事故报告	612	116	84

数据	评价指标	BERT	Gas-kBERT(raw)	Gas-kBERT v1.0	Gas-kBERT v1.1
数据	评价指标	(+news_zh_2016+ wiki_zh_2019)	GAR+GS+GN+GEP (原始文本)	(GAR+GS+GN+GEP) (ChatGPT增强后文本)	(+news_zh_2016+ wiki_zh_2019+ GAR+GS+GN+GEP) (ChatGPT增强后文本)
GAR	P	11.11	24.22	55.56	55.56
	R	13.33	35.19	66.67	33.33
	F₁	12.12	30.68	60.61	41.67
GN	P	30.3	40.02	46.10	40.40
	R	16.67	20.15	14.60	43.01
	F₁	41.67	50.26	69.82	51.34

数据	模型	评价指标	BERT	Gas-kBERT(raw)	Gas-kBERT v1.0	Gas-kBERT v1.1
数据	类型	评价指标	(+news_zh_2016+ wiki_zh_2019)	GAR+GS+GN+ GEP(原始文本)	(GAR+GS+GN+GEP) (ChatGPT增强后文本)	(+news_zh_2016+ wiki_zh_2019+ GAR+GS+GN+GEP) (ChatGPT增强后文本)
GEP	总则	P	42.42	50.13	62.50	60.87
		R	93.33	93.12	100	93.33
		F	58.33	70.23	76.92	73.68
	应急组织体系及职责	P	64.91	40.29	81.82	41.38
		R	94.87	90.26	92.31	30.77
		F₁	77.08	70.33	86.75	35.29
	监测、预警	P	100	50.21	66.67	52.63
		R	8.33	70.92	83.33	83.33
		F₁	15.38	60.71	74.07	64.52
	应急响应	P	45.95	60.07	63.16	66.67
		R	73.91	60.91	52.17	52.17
		F₁	56.67	56.34	57.14	58.54
	信息报告与发布	P	0	10.19	25	20
		R	0	8.28	12.50	12.50
		F₁	0	10.07	16.67	15.38
	善后恢复	P	0	0	0	0
		R	0	0	0	0
		F₁	0	0	0	0
	保障措施	P	0	40.43	55.56	47.06
		R	0	56.67	71.43	57.14
		F₁	0	45.13	62.50	51.61
	宣传教育、培训和应急演练	P	0	0	0	0
		R	0	0	0	0
		F₁	0	0	0	0
	附则	P	0	80.27	100	100
		R	0	70.19	71.43	71.43
		F₁	0	79.62	83.33	83.33
GAR	事故伤亡人员、事故损失	P	86.21	90.30	92.59	92
		R	69.44	87.93	92.59	85.19
		F₁	76.92	80.51	92.59	88.46
	事故原因	P	92.31	93.33	95.65	95.65
		R	85.71	95.25	100	100
		F₁	88.89	96.14	97.78	97.78
	性质	P	47.37	60.19	100	63.64
		R	100	100	100	100
		F₁	64.29	70.21	100	77.78
	事故发生经过	P	60.00	70.01	71.43	100
		R	54.55	60.06	62.50	62.50
		F₁	57.14	62.15	66.67	76.92