中国安全科学学报 ›› 2017, Vol. 27 ›› Issue (8): 156-161.doi: 10.16265/j.cnki.issn1003-3033.2017.08.027

• 安全社会工程 • 上一篇    下一篇

基于Bigram的安全隐患文本分类研究

陈孝慈, 谭章禄 教授, 单斐, 高青   

  1. 中国矿业大学(北京) 管理学院,北京 100083
  • 收稿日期:2017-05-04 修回日期:2017-07-07 出版日期:2017-08-20 发布日期:2020-10-13
  • 作者简介:陈孝慈(1991—),男,浙江宁波人,硕士研究生,研究方向为可视化管理、知识可视化等。E-mail:982599508@qq.com。
  • 基金资助:
    国家自然科学基金资助(61471362)。

Research on text categorization for hidden dangers based on Bigram

CHEN Xiaoci, TAN Zhanglu, SHAN Fei, GAO Qing   

  1. School of Management, China University of Mining & Technology, Beijing 100083, China
  • Received:2017-05-04 Revised:2017-07-07 Online:2017-08-20 Published:2020-10-13

摘要: 鉴于传统文本分类研究缺少针对性,在安全隐患文本分类实际应用中表现不佳,以及企业安全隐患文本文本长度短、特征单元选取困难,为高效地从大量安全隐患文本数据中提取、分析有效信息,更好地掌握安全隐患的发生和变化过程,提出利用Bigram二字串作为特征单元,结合支持向量机(SVM)数据挖掘算法的安全隐患文本分类方法。以潞安集团司马煤业有限公司2009—2015年安全隐患记录为数据源,通过试验,验证该方法的分类效果。结果表明:新的安全隐患分类方法具有较高的准确率、召回率及F-值,与传统方法相比,显著提升了分类的准确度。

关键词: 安全隐患, Bigram二字串, 特征单元, 支持向量机(SVM), 文本分类

Abstract: In view of low pertinency of traditional text classification researches and the poor performance of the actual categorization effect, and in consideration of short text and difficult selection of feature units in the field of enterprises' hidden danger textual data, in order to efficiently and quickly extract and analyze effective information from a large number of hidden danger textual data, a new text categorization method was worked out for hidden dangers on the basis of both the support vector machine data mining algorithm and Bigram string as a feature unit. The method was verified experimentally, by means of all the hidden danger records of Sima Coal Industry Co,Ltd of Lu'an Group in 2009-2015. The results show that the new hidden danger categorization method has a higher precision rate, recall rate and F-measure, and dramatically improves the categorization accuracy compared with that by the traditional methods.

Key words: hidden danger, Bigram, feature unit, support vector machine(SVM), text categorization

中图分类号: