Natural language processing for aviation safety reports: From classification to interactive analysis

被引:105
作者
Tanguy, Ludovic [1 ,2 ]
Tulechki, Nikola [1 ,3 ,4 ]
Urieli, Assaf [1 ,3 ,4 ]
Hermann, Eric [4 ]
Raynal, Celine [4 ]
机构
[1] CLLE ERSS CNRS, Paris, France
[2] Univ Toulouse, Computat Linguist, Toulouse, France
[3] Univ Toulouse, Toulouse, France
[4] CFH Safety Data, Toulouse, France
关键词
Safety reports; Aviation; NLP; Document classification; Text mining;
D O I
10.1016/j.compind.2015.09.005
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
In this paper we describe the different NLP techniques designed and used in collaboration between the CLLE-ERSS research laboratory and the CFH/Safety Data company to manage and analyse aviation incident reports. These reports are written every time anything abnormal occurs during a civil air flight. Although most of them relate routine problems, they are a valuable source of information about possible sources of greater danger. These texts are written in plain language, show a wide range of linguistic variation (telegraphic style overcrowded by acronyms or standard prose) and exist in different languages, even for a single company/country (although our main focus is on English and French). In addition to their variety, their sheer quantity (e.g. 600/month for a large airline company) clearly requires the use of advanced NLP and text mining techniques in order to extract useful information from them. Although this context and objectives seem to indicate that standard NLP techniques can be applied in a straightforward manner, innovative techniques are required to handle the specifics of aviation report text and the complex classification systems. We present several tools that aim at a better access to this data (classification and information retrieval), and help aviation safety experts in their analyses (data text mining and interactive analysis). Some of these tools are currently in test or in use both at the national and international levels, by airline companies as well as by regulation authorities (DGAC,(1) EASA,(2) ICAO(3)). (C) 2015 Elsevier B.V. All rights reserved.
引用
收藏
页码:80 / 95
页数:16
相关论文
共 29 条
  • [1] Andreani V., 2013, TECHNICAL REPORT, P10
  • [2] [Anonymous], 2008, Introduction to information retrieval
  • [3] [Anonymous], 1996, Evaluating Natural Language Processing Systems: An Analysis and Review
  • [4] Probabilistic Topic Models
    Blei, David M.
    [J]. COMMUNICATIONS OF THE ACM, 2012, 55 (04) : 77 - 84
  • [5] Latent Dirichlet allocation
    Blei, DM
    Ng, AY
    Jordan, MI
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) : 993 - 1022
  • [6] Chang J., 2009, Adv. Neural Inf. Process. Syst., V22, DOI DOI 10.5555/2984093.2984126
  • [7] DEERWESTER S, 1990, J AM SOC INFORM SCI, V41, P391, DOI 10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO
  • [8] 2-9
  • [9] DGAC, 2013, RAPP SEC AER 2013 TE
  • [10] Fan RE, 2008, J MACH LEARN RES, V9, P1871