中国安全科学学报 ›› 2026, Vol. 36 ›› Issue (1): 26-34.doi: 10.16265/j.cnki.issn1003-3033.2026.01.0840

• 安全科学理论与方法 • 上一篇    下一篇

基于DeepSeek与RAG的事故调查报告知识生成模型

李华1(), 吴立舟1,2, 李新宏1,**(), 张越1, 冯垚1, 覃紫芸1   

  1. 1 西安建筑科技大学 资源工程学院,陕西 西安 710055
    2 西安科技大学 安全科学与工程学院,陕西 西安 710054
  • 收稿日期:2025-09-14 修回日期:2025-11-21 出版日期:2026-02-08
  • 通信作者:
    ** 李新宏(1991—),男,甘肃镇原人,博士,教授,博士生导师,主要从事能源安全工程、安全信息化技术、风险评估与控制方面的研究。E-mail:
  • 作者简介:

    李华 (1979—),女,陕西西安人,博士,副教授,硕士生导师,主要从事企业风险评估与安全管理、建筑安全监测与监控、公共安全与应急管理方面的研究。E-mail:

  • 基金资助:
    2025年国家级大学创新训练计划项目(202510703088)

Construction and application of intelligent question-answering model for accident investigation reports based on DeepSeek and RAG

LI Hua1(), WU Lizhou1,2, LI Xinhong1,**(), ZHANG Yue1, FENG Yao1, QIN Ziyun1   

  1. 1 School of Resources Engineering,Xi'an University of Architecture and Technology, Xi'an Shaanxi 710055, China
    2 College of Safety Science and Engineering,Xi'an University of Science and Technology, Xi 'an Shaanxi 710054, China
  • Received:2025-09-14 Revised:2025-11-21 Published:2026-02-08

摘要:

为解决大语言模型(LLM)在安全工程领域应用中面临的语料资源有限、输入容量受限和数据隐私等制约因素问题,构建结合DeepSeek与检索增强生成(RAG)机制的本地化事故问答模型,以实现复杂文本的智能解析和知识服务,从而辅助安全管理决策。基于政府应急管理系统发布的事故报告与法律法规,构建语义特征语料库,融合PaddleOCR、LayoutLMv3、YOLOv8等技术完成文档结构重建与语义建模。模型涵盖文档解析、语义对齐、知识库构建和混合检索4阶段,具备因果链条提取、法规匹配与语义映射能力。结果表明:相较未使用RAG机制的Deepseek-r1:32b模型,模型问答自动评分提高7.7%,人工评分提高17.6%,响应速度与稳定性指标较对照模型呈现出更高的数值表现。模型运行仍受到本地参数规模与知识更新机制的影响,试验结果表明其在任务中可实现预期功能。

关键词: DeepSeek, 大语言模型(LLM), 检索增强生成(RAG), 事故调查报告, 知识库

Abstract:

In order to address the constraints of limited corpus resources, restricted input capacity, and data privacy in applying LLMs to the field of safety engineering, a localized accident question-answering model integrating the DeepSeek with a RAG mechanism was constructed to enable intelligent parsing and knowledge services for complex texts, thereby supporting safety management decision-making. A semantic-feature corpus was built based on accident investigation reports and laws and regulations released by government emergency management systems, and technologies such as PaddleOCR, LayoutLMv3, and YOLOv8 were incorporated to accomplish document structure reconstruction and semantic modeling. The model encompassed four stages—document parsing, semantic alignment, knowledge-base construction, and hybrid retrieval—and was designed with capabilities for causal-chain extraction, regulation matching, and semantic mapping. The results indicated that, compared with the Deepseek-r1:32b model without the RAG mechanism, the enhanced model achieved improvements of 7.7% in automated scoring and 17.6% in human evaluation, and the response-speed and stability metrics presented higher numerical performance than those of the baseline model. The model performance was still influenced by the local parameter scale and the knowledge-updating mechanism, yet the experimental findings demonstrate that it is capable of fulfilling the intended functions in the present study.

Key words: DeepSeek, large language model (LLM), retrieval augmented generation, accident investigation report, knowledge base

中图分类号: