China Safety Science Journal ›› 2022, Vol. 32 ›› Issue (12): 53-62.doi: 10.16265/j.cnki.issn1003-3033.2022.12.2727

• Safety science theory and safety system science • Previous Articles     Next Articles

Named entity recognition of HSE inspection minutes based on data enhancement

XIA Zhanjie(), ZHANG Beike, GAO Dong**()   

  1. School of Information and Technology, Beijing University of Chemical Technology, Beijing 100029, China
  • Received:2022-07-30 Revised:2022-10-23 Online:2022-12-28 Published:2023-06-28
  • Contact: GAO Dong

Abstract:

In order to solve the problems faced by deep learning model in text mining of safety inspection minutes, such as small data set size, uneven distribution of sample data and poor effect of NER, a new data enhancement method for NER was proposed. First of all, the named entities in the data set were separated and the same kind of named entities were replaced randomly, which could not only avoid the damage of data enhancement technology to the information of named entities, but also make the distribution of named entities more uniform. Then, by optimizing the noise data and scale parameters of other parts, the effect of NER was further improved. Finally, the separated data was automatically labeled and recombined to avoid the disadvantage of manually marking a large amount of data. The results show that this method can quickly solve the problems such as the small amount of data and the uneven distribution of named entities in the dataset. Compared with the latest AEDA (An Easier Data Augmentation) method, this method achieves better recognition results on data sets such as HSE inspection minutes, and improves the comprehensive evaluation index of the model on one-fold expanded data from 92.83% to 97.23%. At the same time, the spatial distribution and strong association rules of safety hazards in construction process can be obtained.

Key words: data enhancement, health safety environment(HSE), inspection minutes, named entity recognition(NER), hidden danger, text mining