China Safety Science Journal ›› 2017, Vol. 27 ›› Issue (8): 156-161.doi: 10.16265/j.cnki.issn1003-3033.2017.08.027

• Safety Social Engineering Work • Previous Articles     Next Articles

Research on text categorization for hidden dangers based on Bigram

CHEN Xiaoci, TAN Zhanglu, SHAN Fei, GAO Qing   

  1. School of Management, China University of Mining & Technology, Beijing 100083, China
  • Received:2017-05-04 Revised:2017-07-07 Online:2017-08-20 Published:2020-10-13

Abstract: In view of low pertinency of traditional text classification researches and the poor performance of the actual categorization effect, and in consideration of short text and difficult selection of feature units in the field of enterprises' hidden danger textual data, in order to efficiently and quickly extract and analyze effective information from a large number of hidden danger textual data, a new text categorization method was worked out for hidden dangers on the basis of both the support vector machine data mining algorithm and Bigram string as a feature unit. The method was verified experimentally, by means of all the hidden danger records of Sima Coal Industry Co,Ltd of Lu'an Group in 2009-2015. The results show that the new hidden danger categorization method has a higher precision rate, recall rate and F-measure, and dramatically improves the categorization accuracy compared with that by the traditional methods.

Key words: hidden danger, Bigram, feature unit, support vector machine(SVM), text categorization

CLC Number: