中国安全科学学报 ›› 2022, Vol. 32 ›› Issue (5): 112-118.doi: 10.16265/j.cnki.issn1003-3033.2022.05.2389

• 安全工程技术 • 上一篇    下一篇

面向海量不平衡数据的轨道电路故障诊断方法

邢玉龙1(), 王剑1,2, 上官伟1,2, 彭聪1, 朱林富3,4   

  1. 1 北京交通大学 电子信息工程学院,北京 100044
    2 北京交通大学 轨道交通控制与安全国家重点实验室,北京 100044
    3 中国铁道科学研究院集团有限公司 标准计量研究所,北京 100081
    4 中铁检验认证中心有限公司,北京 100081
  • 收稿日期:2021-12-16 修回日期:2022-04-09 出版日期:2022-08-17 发布日期:2022-11-28
  • 作者简介:

    邢玉龙 (1992—),男,河北衡水人,博士研究生,研究方向为轨道交通信息数据挖掘、铁路信号设备故障诊断、铁路信号智能运维。E-mail:

    王剑, 教授

    上官伟, 教授

  • 基金资助:
    北京市自然科学基金资助(L191013)

Track circuit fault diagnosis method for massive imbalanced data

XING Yulong1(), WANG Jian1,2, SHANGGUAN Wei1,2, PENG Cong1, ZHU Linfu3,4   

  1. 1 School of Electronics and Information Engineering, Beijing Jiaotong University, Beijing 100044, China
    2 State Key Laboratory of Rail Traffic Control and Safety, Beijing Jiaotong University, Beijing 100044, China
    3 Standards & Metrology Research Institute, China Academy of Railway Sciences Corporation Limited, Beijing 100081, China
    4 China Railway Test & Certification Center, Beijing 100081, China
  • Received:2021-12-16 Revised:2022-04-09 Online:2022-08-17 Published:2022-11-28

摘要:

为解决轨道电路故障诊断任务中监测数据类别不平衡导致诊断模型决策边界产生偏移,以及数据海量导致训练速度慢的问题,提出数据重采样与集成学习算法相结合的轨道电路故障诊断方法。首先,通过特征合成和数据重采样处理不平衡数据,重采样包括随机降采样和合成少数类过采样技术(SMOTE);然后,利用训练高效的轻量梯度提升机(LightGBM)算法构建面向海量监测数据的故障诊断模块,并设计训练及诊断流程,以网格搜索和交叉验证方法调整关键参数;最后,引入不易受数据不平衡影响的Macro-F1值作为故障诊断模型评价指标。结果表明:特征合成、数据重采样对不平衡数据下的各故障诊断模型的综合表现均有不同程度的提升,LightGBM相较于其他算法在综合表现和训练时间上都是最佳,可保障准确性和面对海量数据时的快速性。

关键词: 不平衡数据, 轨道电路, 故障诊断, 集成学习, 轻量梯度提升机(LightGBM)

Abstract:

In order to address deviation of decision-making boundary of track circuit diagnosis model due to imbalanced monitoring data and slow training speed caused by massive data, a fault diagnosis method based on data resampling and ensemble learning algorithm was proposed. Firstly, imbalanced data were processed by feature synthesis and resampling including random down-sampling and Synthetic Minority Oversampling Technique (SMOTE). Secondly, a fault diagnosis module for massive monitoring data was constructed based on LightGBM algorithm which could be trained efficiently, training and diagnosis flow was designed, and key parameters were selected by grid search and cross-validation. Finally, Macro-F1, which was not affected by imbalanced data, was introduced as an evaluation indicator of the model. The results show that the comprehensive performance of each diagnosis model for imbalanced data can be improved by feature synthesis and data resampling. Compared with other algorithms, LightGBM is the best in terms of comprehensive performance and training time, ensuring superiority and rapidity when faced with massive data.

Key words: imbalanced data, track circuit, fault diagnosis, ensemble learning, light gradient boosting machine(LightGBM)