Natural Language Processing in Diagnostic Texts from Nephropathology

被引:8
作者
Legnar, Maximilian [1 ,2 ,7 ]
Daumke, Philipp [3 ]
Hesser, Juergen [2 ,4 ]
Porubsky, Stefan [5 ]
Popovic, Zoran [2 ,7 ]
Bindzus, Jan Niklas [2 ,7 ]
Siemoneit, Joern-Helge Heinrich [2 ,7 ]
Weis, Cleo-Aron [2 ,6 ]
机构
[1] Heidelberg Univ, Med Fac Mannheim, Mannheim Inst Intelligent Syst Med MIISM, D-68167 Mannheim, Germany
[2] Heidelberg Univ, Med Fac Mannheim, Inst Pathol, D-68167 Mannheim, Germany
[3] Averbis GmbH, D-79098 Freiburg, Germany
[4] Heidelberg Univ, CZS Heidelberg Ctr Model Based AI, Cent Inst Comp Engn ZITI, Data Anal & Modeling,MIISM,Med Sch,Interdisciplin, D-69117 Heidelberg, Germany
[5] Univ Hosp Mainz, Med Fac Mainz, Inst Pathol, D-55131 Mainz, Germany
[6] Med Fac Heidelberg, Inst Pathol, D-69120 Heidelberg, Germany
[7] Heidelberg Univ, Med Fac Mannheim, Inst Pathol, D-69117 Heidelberg, Germany
关键词
NLP; text analysis; nephropathology; text classification; topic modelling; BERT; transformer encoder; machine learning; deep learning; AUTOMATED CLASSIFICATION; PATHOLOGY;
D O I
10.3390/diagnostics12071726
中图分类号
R5 [内科学];
学科分类号
1002 ; 100201 ;
摘要
Introduction: This study investigates whether it is possible to predict a final diagnosis based on a written nephropathological description-as a surrogate for image analysis-using various NLP methods. Methods: For this work, 1107 unlabelled nephropathological reports were included. (i) First, after separating each report into its microscopic description and diagnosis section, the diagnosis sections were clustered unsupervised to less than 20 diagnostic groups using different clustering techniques. (ii) Second, different text classification methods were used to predict the diagnostic group based on the microscopic description section. Results: The best clustering results (i) could be achieved with HDBSCAN, using BoW-based feature extraction methods. Based on keywords, these clusters can be mapped to certain diagnostic groups. A transformer encoder-based approach as well as an SVM worked best regarding diagnosis prediction based on the histomorphological description (ii). Certain diagnosis groups reached F1-scores of up to 0.892 while others achieved weak classification metrics. Conclusion: While textual morphological description alone enables retrieving the correct diagnosis for some entities, it does not work sufficiently for other entities. This is in accordance with a previous image analysis study on glomerular change patterns, where some diagnoses are associated with one pattern, but for others, there exists a complex pattern combination.
引用
收藏
页数:25
相关论文
共 58 条
  • [1] Abadi Martin, 2016, arXiv
  • [2] Computational pathology definitions, best practices, and recommendations for regulatory guidance: a white paper from the Digital Pathology Association
    Abels, Esther
    Pantanowitz, Liron
    Aeffner, Famke
    Zarella, Mark D.
    van der Laak, Jeroen
    Bui, Marilyn M.
    Vemuri, Venkata N. P.
    Parwani, Anil V.
    Gibbs, Jeff
    Agosto-Arroyo, Emmanuel
    Beck, Andrew H.
    Kozlowski, Cleopatra
    [J]. JOURNAL OF PATHOLOGY, 2019, 249 (03) : 286 - 294
  • [3] Alsentzer E, 2019, Arxiv, DOI [arXiv:1904.03323, DOI 10.48550/ARXIV.1904.03323]
  • [4] Angelov D, 2020, Arxiv, DOI [arXiv:2008.09470, DOI 10.48550/ARXIV.2008.09470]
  • [5] Digital pathology and computational image analysis in nephropathology
    Barisoni, Laura
    Lafata, Kyle J.
    Hewitt, Stephen M.
    Madabhushi, Anant
    Balis, Ulysses G. J.
    [J]. NATURE REVIEWS NEPHROLOGY, 2020, 16 (11) : 669 - 685
  • [6] Artificial intelligence and machine learning in nephropathology
    Becker, Jan U.
    Mayerich, David
    Padmanabhan, Meghana
    Barratt, Jonathan
    Ernst, Angela
    Boor, Peter
    Cicalese, Pietro A.
    Mohan, Chandra
    Nguyen, Hien V.
    Roysam, Badrinath
    [J]. KIDNEY INTERNATIONAL, 2020, 98 (01) : 65 - 75
  • [7] Bird S., 2009, NATURAL LANGUAGE PRO
  • [8] The Unified Medical Language System (UMLS): integrating biomedical terminology
    Bodenreider, O
    [J]. NUCLEIC ACIDS RESEARCH, 2004, 32 : D267 - D270
  • [9] Brownlee J., 2017, Deep learning for natural language processing
  • [10] Region-Based Convolutional Neural Nets for Localization of Glomeruli in Trichrome-Stained Whole Kidney Sections
    Bukowy, John D.
    Dayton, Alex
    Cloutier, Dustin
    Manis, Anna D.
    Staruschenko, Alexander
    Lombard, Julian H.
    Woods, Leah C. Solberg
    Beard, Daniel A.
    Cowley, Allen W., Jr.
    [J]. JOURNAL OF THE AMERICAN SOCIETY OF NEPHROLOGY, 2018, 29 (08): : 2081 - 2088