Use of Natural Language Processing to Extract and Classify Papillary Thyroid Cancer Features From Surgical Pathology Reports

被引:2
作者
Loor-Torres, Ricardo [1 ]
Wu, Yuqi [2 ]
Cabezas, Esteban [1 ]
Borras-Osorio, Mariana [1 ]
Toro-Tobon, David [3 ]
Duran, Mayra [1 ]
Al Zahidy, Misk [1 ]
Chavez, Maria Mateo [1 ]
Jacome, Cristian Soto [1 ]
Fan, Jungwei W. [2 ]
Ospina, Naykky M. Singh [4 ]
Wu, Yonghui [5 ]
Brito, Juan P. [1 ,3 ]
机构
[1] Mayo Clin, Div Endocrinol Diabet Nutr & Metab, Knowledge & Evaluat Res Unit, 200 First St SW, Rochester, MN 55902 USA
[2] Mayo Clin, Dept Artificial Intelligence & Informat, Rochester, MN USA
[3] Mayo Clin, Div Endocrinol Diabet Metab & Nutr, Rochester, MN USA
[4] Univ Florida, Dept Med, Div Endocrinol, Gainesville, FL USA
[5] Univ Florida, Dept Hlth Outcomes & Biomed Informat, Gainesville, FL USA
基金
美国国家卫生研究院;
关键词
artificial intelligence; Natural Language Processing; thyroid cancer;
D O I
10.1016/j.eprac.2024.08.008
中图分类号
R5 [内科学];
学科分类号
1002 ; 100201 ;
摘要
Background: We aim to use Natural Language Processing to automate the extraction and classification of thyroid cancer risk factors from pathology reports. Methods: We analyzed 1410 surgical pathology reports from adult papillary thyroid cancer patients from 2010 to 2019. Structured and nonstructured reports were used to create a consensus-based ground truth dictionary and categorized them into modified recurrence risk levels. Nonstructured reports were narrative, while structured reports followed standardized formats. We developed ThyroPath, a rule-based Natural Language Processing pipeline, to extract and classify thyroid cancer features into risk categories. Training involved 225 reports (150 structured, 75 unstructured), with testing on 170 reports (120 structured, 50 unstructured) for evaluation. The pipeline's performance was assessed using both strict and lenient criteria for accuracy, precision, recall, and F1-score; a metric that combines precision and recall evaluation. Results: In extraction tasks, ThyroPath achieved overall strict F-1 scores of 93% for structured reports and 90% for unstructured reports, covering 18 thyroid cancer pathology features. In classification tasks, ThyroPath-extracted information demonstrated an overall accuracy of 93% in categorizing reports based on their corresponding guideline-based risk of recurrence: 76.9% for high-risk, 86.8% for intermediate risk, and 100% for both low and very low-risk cases. However, ThyroPath achieved 100% accuracy across all risk categories with human extracted pathology information. Conclusions: ThyroPath shows promise in automating the extraction and risk recurrence classification of thyroid pathology reports at large scale. It offers a solution to laborious manual reviews and advancing virtual registries. However, it requires further validation before implementation. (c) 2024 AACE. Published by Elsevier Inc. All rights are reserved, including those for text and data mining, AI training, and similar technologies.
引用
收藏
页码:1051 / 1058
页数:8
相关论文
共 24 条
  • [1] Thyroid Cancer
    Boucai, Laura
    Zafereo, Mark
    Cabanillas, Maria E.
    [J]. JAMA-JOURNAL OF THE AMERICAN MEDICAL ASSOCIATION, 2024, 331 (05): : 425 - 435
  • [2] Cancer.Net, Thyroid cancer: statistics
  • [3] Revised American Thyroid Association Management Guidelines for Patients with Thyroid Nodules and Differentiated Thyroid Cancer
    Cooper, David S.
    Doherty, Gerard M.
    Haugen, Bryan R.
    Kloos, Richard T.
    Lee, Stephanie L.
    Mandel, Susan J.
    Mazzaferri, Ernest L.
    McIver, Bryan
    Pacini, Furio
    Schlumberger, Martin
    Sherman, Steven I.
    Steward, David L.
    Tuttle, R. Michael
    [J]. THYROID, 2009, 19 (11) : 1167 - 1214
  • [4] Clinical features of Hispanic thyroid cancer cases and the role of known genetic variants on disease risk
    Estrada-Florez, Ana P.
    Bohorquez, Mabel E.
    Sahasrabudhe, Ruta
    Prieto, Rodrigo
    Lott, Paul
    Duque, Carlos S.
    Donado, Jorge
    Mateus, Gilbert
    Bolanos, Fernando
    Velez, Alejandro
    Echeverry, Magdalena
    Carvajal-Carmona, Luis G.
    [J]. MEDICINE, 2016, 95 (32)
  • [5] 2015 American Thyroid Association Management Guidelines for Adult Patients with Thyroid Nodules and Differentiated Thyroid Cancer The American Thyroid Association Guidelines Task Force on Thyroid Nodules and Differentiated Thyroid Cancer
    Haugen, Bryan R.
    Alexander, Erik K.
    Bible, Keith C.
    Doherty, Gerard M.
    Mandel, Susan J.
    Nikiforov, Yuri E.
    Pacini, Furio
    Randolph, Gregory W.
    Sawka, Anna M.
    Schlumberger, Martin
    Schuff, Kathryn G.
    Sherman, Steven I.
    Sosa, Julie Ann
    Steward, David L.
    Tuttle, R. Michael
    Wartofsky, Leonard
    [J]. THYROID, 2016, 26 (01) : 1 - 133
  • [6] MedTator: a serverless annotation tool for corpus development
    He, Huan
    Fu, Sunyang
    Wang, Liwei
    Liu, Sijia
    Wen, Andrew
    Liu, Hongfang
    [J]. BIOINFORMATICS, 2022, 38 (06) : 1776 - 1778
  • [7] Natural Language Processing in Electronic Health Records in relation to healthcare decision-making: A systematic review
    Hossain, Elias
    Rana, Rajib
    Higgins, Niall
    Soar, Jeffrey
    Barua, Prabal Datta
    Pisani, Anthony R.
    Turner, Kathryn
    [J]. COMPUTERS IN BIOLOGY AND MEDICINE, 2023, 155
  • [8] Jacome Cristian Soto, 2024, Mayo Clin Proc Digit Health, V2, P67, DOI 10.1016/j.mcpdig.2024.01.001
  • [9] Assessment of Deep Natural Language Processing in Ascertaining Oncologic Outcomes From Radiology Reports
    Kehl, Kenneth L.
    Elmarakeby, Haitham
    Nishino, Mizuki
    Van Allen, Eliezer M.
    Lepisto, Eva M.
    Hassett, Michael J.
    Johnson, Bruce E.
    Schrag, Deborah
    [J]. JAMA ONCOLOGY, 2019, 5 (10) : 1421 - 1429
  • [10] Epidemiology of Thyroid Cancer
    Kitahara, Cari M.
    Schneider, Arthur B.
    [J]. CANCER EPIDEMIOLOGY BIOMARKERS & PREVENTION, 2022, 31 (07) : 1284 - 1297