China Safety Science Journal ›› 2024, Vol. 34 ›› Issue (2): 37-44.doi: 10.16265/j.cnki.issn1003-3033.2024.02.0121

• Safety social science and safety management • Previous Articles     Next Articles

Short text classification of civil aviation intelligent supervision based on character-word fusion

WANG Xin1(), GAN Zurui1, XU Yaxi2,**(), SHI Ke3, ZHENG Tao1   

  1. 1 School of Computer, Civil Aviation Flight University of China, Guanghan Sichuan 618307, China
    2 School of Economics and Management, Civil Aviation Flight University of China, Guanghan Sichuan 618307, China
    3 Institute of Civil Aviation Supervisor Training, Civil Aviation Flight University of China, Guanghan Sichuan 618307, China
  • Received:2023-08-14 Revised:2023-11-20 Online:2024-02-28 Published:2024-08-28
  • Contact: XU Yaxi

Abstract:

In order to address the inefficiencies in manually classifying and analyzing inspection records about civil aviation supervision, a dual-channel feature extraction short text classification model was proposed. The model combined data augmentation techniques and character-word vector fusion. The model aimed to tackle classification issues related to people, equipment and facilities, institutional procedures and institutional responsibilities in civil aviation supervised matters. In order to tackle the issue of class imbalance, data augmentation algorithms were employed to generate new samples by transforming the original texts, thereby balancing the sample sizes across different categories. The word vectors and character vectors were fused by combining them at the character level, resulting in character vectors that retain word-level features. These fused character vectors were then fed into TextCNN and BiLSTM for feature extraction at different dimensions. By extracting features from both local and global perspectives, this dual-channel approach aimed to capture comprehensive and effective information from the inspection records dataset in civil aviation regulatory matters. Experimental results on the civil aviation regulatory matter inspection record dataset demonstrate that the proposed model achieves an accuracy of 0.983 7 and an F1 score of 0.983 6. Compared with some existing word embedding models and character embedding models, the accuracy is improved by 0.4%. Furthermore, when compared with commonly used single-channel models, the accuracy is increased by 3%, which validates the effectiveness and comprehensiveness of the features extracted by the dual-channel model.

Key words: character-word vector fusion, civil aviation supervision, short text, text convolutional neural networks(TextCNN), bi-directional long short-term memory(BiLSTM)

CLC Number: