中国安全科学学报 ›› 2022, Vol. 32 ›› Issue (2): 115-120.doi: 10.16265/j.cnki.issn1003-3033.2022.02.016

• 安全工程技术 • 上一篇    下一篇

基于BERT的灾害三元组信息抽取优化研究

宋敦江1(), 杨霖2, 钟少波3,**()   

  1. 1 中国科学院 科技战略咨询研究院,北京 100190
    2 太原学院 计算机科学与技术系,山西 太原 030032
    3 北京城市系统工程研究中心 城市运行研究部,北京 100035
  • 收稿日期:2021-11-22 修回日期:2022-01-15 出版日期:2022-08-18 发布日期:2022-08-28
  • 通讯作者: 钟少波
  • 作者简介:

    宋敦江 (1979—),男,湖北黄石人,博士,副研究员,主要从事可持续发展战略研究。E-mail:

  • 基金资助:
    国家重点研发计划项目(2019YFF0301300); 国家自然科学基金资助(41471338); 中国科学院科技战略咨询研究院重点培育项目; 北京市科技计划课题(Z191100001419002)

Research on optimization of disaster triplet information extraction based on BERT

SONG Dunjiang1(), YANG Lin2, ZHONG Shaobo3,**()   

  1. 1 Institute of Science and Development, Chinese Academy of Sciences, Beijing 100190, China
    2 Department of Computer Science and Technology, Taiyuan University, Taiyuan Shanxi 030032, China
    3 Urban Operation Research Department, Beijing Research Center of Urban Systems Engineering, Beijing 100035, China
  • Received:2021-11-22 Revised:2022-01-15 Online:2022-08-18 Published:2022-08-28
  • Contact: ZHONG Shaobo

摘要:

为从网络媒体文本中快速、准确提取灾害三元组信息,利用自然语言处理(NLP)技术,研究灾害三元组信息抽取应用及其算法优化。通过双向编码器表示(BERT)预训练语言模型,应用于地质灾害三元组信息提取的实例中,针对模型由于底层多头注意力(MHA)机制会导致“低秩瓶颈”问题,对此,通过增大模型key-size对其进行优化。结果表明:所提方法能够显著提升从新闻报道等文本中提取地质灾害种类、发生地点、发生时间等关键信息的容错率及精准率;可得到对地质等灾害空间分布情况和趋势的分析,进而为预案编制、应急资源优化配置、区域监测预警等灾害应急管理工作提供科学分析和决策信息支持。

关键词: 然语言处理(NLP), 双向编码器表示(BERT), 低秩瓶颈, 多头注意力(MHA), 灾害信息

Abstract:

In order to quickly and precisely extract triplet information from text on online social media, NLP technology was utilized to study the application and algorithm optimization of the information extraction. Then, BERT pre-trained language model was applied in a case of triplet information extraction of geological disasters. Considering the model's problems of "low-rank bottlenecks" caused by its own MHA mechanism, the key-size was increased to optimize the model. The results show that the proposed method can significantly improve fault tolerance and accuracy of disaster information extraction, including disaster type, occurrence location, occurrence time, from news reports. And it can be used to analyze spatial distribution and trend of disasters, and then provide scientific analysis and decision-making support for disaster emergency management, such as preparation of emergency plan, optimal allocation of emergency resources, regional monitoring and early warning, etc.

Key words: natural language processing (NLP), bidirectional encoder representation from transformers(BERT), low-rank bottleneck, multi-head attention(MHA), disaster information