Automatic detection of patients with invasive fungal disease from free-text computed tomography (CT) scans

被引:20
作者
Martinez, David [1 ]
Ananda-Rajah, Michelle R. [2 ,3 ]
Suominen, Hanna [4 ,5 ,6 ,7 ]
Slavin, Monica A. [8 ,9 ]
Thursky, Karin A. [8 ,9 ]
Cavedon, Lawrence [10 ]
机构
[1] Univ Melbourne, CIS Dept, Melbourne, Vic 3010, Australia
[2] Alfred Hlth, Infect Dis Unit, Melbourne, Vic, Australia
[3] Univ Melbourne, Melbourne, Vic 3010, Australia
[4] NICTA, Canberra, ACT, Australia
[5] Australian Natl Univ, Canberra, ACT, Australia
[6] Univ Canberra, Canberra, ACT 2601, Australia
[7] Univ Turku, SF-20500 Turku, Finland
[8] Royal Melbourne Hosp, Peter MacCallum Canc Inst, Victorian Infect Dis Serv, Parkville, Vic, Australia
[9] Peter MacCallum Canc Inst, Dept Infect Dis, Melbourne, Vic, Australia
[10] RMIT Univ, Sch Comp Sci & IT, Melbourne, Vic, Australia
基金
澳大利亚研究理事会;
关键词
Natural language processing; Data mining; Surveillance; Invasive fungal disease; Aspergillosis; CELL TRANSPLANT RECIPIENTS; RADIOLOGY REPORTS; BIOMEDICAL TEXT; ASPERGILLOSIS; INFECTIONS; SURVEILLANCE; PNEUMONIA; IDENTIFICATION; COMPLICATIONS; CHEMOTHERAPY;
D O I
10.1016/j.jbi.2014.11.009
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Background: Invasive fungal diseases (IFDs) are associated with considerable health and economic costs. Surveillance of the more diagnostically challenging invasive fungal diseases, specifically of the sino-pulmonary system, is not feasible for many hospitals because case finding is a costly and labour intensive exercise. We developed text classifiers for detecting such IFDs from free-text radiology (CT) reports, using machine-learning techniques. Method: We obtained free-text reports of CT scans performed over a specific hospitalisation period (2003-2011), for 264 IFD and 289 control patients from three tertiary hospitals. We analysed IFD evidence at patient, report, and sentence levels. Three infectious disease experts annotated the reports of 73 IFD-positive patients for language suggestive of IFD at sentence level, and graded the sentences as to whether they suggested or excluded the presence of IFD. Reliable agreement between annotators was obtained and this was used as training data for our classifiers. We tested a variety of Machine Learning (ML), rule based, and hybrid systems, with feature types including bags of words, bags of phrases, and bags of concepts, as well as report-level structured features. Evaluation was carried out over a robust framework with separate Development and Held-Out datasets. Results: The best systems (using Support Vector Machines) achieved very high recall at report- and patient-levels over unseen data: 95% and 100% respectively. Precision at report-level over held-out data was 71%; however, most of the associated false-positive reports (53%) belonged to patients who had a previous positive report appropriately flagged by the classifier, reducing negative impact in practice. Conclusions: Our machine learning application holds the potential for developing systematic IFD surveillance systems for hospital populations. (C) 2014 Elsevier Inc. All rights reserved.
引用
收藏
页码:251 / 260
页数:10
相关论文
共 53 条
  • [1] Agirre E., 2000, Proceedings of the COLING-2000 Workshop on Semantic Annotation and Intelligent Content, P11
  • [2] Ananda-Rajah M, 2011, 51 INT C ANT AG CHEM
  • [3] Facilitating Surveillance of Pulmonary Invasive Mold Diseases in Patients with Haematological Malignancies by Screening Computed Tomography Reports Using Natural Language Processing
    Ananda-Rajah, Michelle R.
    Martinez, David
    Slavin, Monica A.
    Cavedon, Lawrence
    Dooley, Michael
    Cheng, Allen
    Thursky, Karin A.
    [J]. PLOS ONE, 2014, 9 (09):
  • [4] Comparative clinical effectiveness of prophylactic voriconazole/posaconazole to fluconazole/itraconazole in patients with acute myeloid leukemia/myelodysplastic syndrome undergoing cytotoxic chemotherapy over a 12-year period
    Ananda-Rajah, Michelle R.
    Grigg, Andrew
    Downey, Maria T.
    Bajel, Ashish
    Spelman, Tim
    Cheng, Allen
    Thursky, Karin T.
    Vincent, Janette
    Slavin, Monica A.
    [J]. HAEMATOLOGICA-THE HEMATOLOGY JOURNAL, 2012, 97 (03): : 459 - 463
  • [5] Attributable Hospital Cost and Antifungal Treatment of Invasive Fungal Diseases in High-Risk Hematology Patients: an Economic Modeling Approach
    Ananda-Rajah, Michelle R.
    Cheng, Allen
    Morrissey, C. Orla
    Spelman, Tim
    Dooley, Michael
    Neville, A. Munro
    Slavin, Monica
    [J]. ANTIMICROBIAL AGENTS AND CHEMOTHERAPY, 2011, 55 (05) : 1953 - 1960
  • [6] [Anonymous], 2005, DATA MINING
  • [7] [Anonymous], THESIS U WAIKATO NZ
  • [8] Aronson AR, 2001, J AM MED INFORM ASSN, P17
  • [9] Pneumonia identification using statistical feature selection
    Bejan, Cosmin Adrian
    Xia, Fei
    Vanderwende, Lucy
    Wurfel, Mark M.
    Yetisgen-Yildiz, Meliha
    [J]. JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2012, 19 (05) : 817 - 823
  • [10] Carletta J, 1996, COMPUT LINGUIST, V22, P249