TECRR: a benchmark dataset of radiological reports for BI-RADS classification with machine learning, deep learning, and large language model baselines

被引：2

作者：

Hussain, Sadam ^{[1
]}

Naseem, Usman ^{[2
]}

Ali, Mansoor ^{[1
]}

Avalos, Daly Betzabeth Avendano ^{[3
]}

Cardona-Huerta, Servando ^{[3
]}

Palomo, Beatriz Alejandra Bosques ^{[1
]}

Tamez-Pena, Jose Gerardo ^{[3
]}

机构：

[1] Tecnol Monterrey, Sch Engn & Sci, Monterrey 64849, Nuevo Leon, Mexico

[2] Macquarie Univ, Sch Comp, Sydney, NSW 2109, Australia

[3] Tecnol Monterrey, Sch Med, Monterrey 64849, Nuevo Leon, Mexico

来源：

BMC MEDICAL INFORMATICS AND DECISION MAKING | 2024年 / 24卷 / 01期

关键词：

BI-RADS classification; Breast radiological reports; TF-IDF; Word2vec; NLP; ML; AUTOMATIC CLASSIFICATION; MRI;

D O I：

10.1186/s12911-024-02717-7

中图分类号：

R-058 [];

学科分类号：

摘要：

BackgroundRecently, machine learning (ML), deep learning (DL), and natural language processing (NLP) have provided promising results in the free-form radiological reports' classification in the respective medical domain. In order to classify radiological reports properly, a high-quality annotated and curated dataset is required. Currently, no publicly available breast imaging-based radiological dataset exists for the classification of Breast Imaging Reporting and Data System (BI-RADS) categories and breast density scores, as characterized by the American College of Radiology (ACR). To tackle this problem, we construct and annotate a breast imaging-based radiological reports dataset and its benchmark results. The dataset was originally in Spanish. Board-certified radiologists collected and annotated it according to the BI-RADS lexicon and categories at the Breast Radiology department, TecSalud Hospitals Monterrey, Mexico. Initially, it was translated into English language using Google Translate. Afterwards, it was preprocessed by removing duplicates and missing values. After preprocessing, the final dataset consists of 5046 unique reports from 5046 patients with an average age of 53 years and 100% women. Furthermore, we used word-level NLP-based embedding techniques, term frequency-inverse document frequency (TF-IDF) and word2vec to extract semantic and syntactic information. We also compared the performance of ML, DL and large language models (LLMs) classifiers for BI-RADS category classification.BackgroundRecently, machine learning (ML), deep learning (DL), and natural language processing (NLP) have provided promising results in the free-form radiological reports' classification in the respective medical domain. In order to classify radiological reports properly, a high-quality annotated and curated dataset is required. Currently, no publicly available breast imaging-based radiological dataset exists for the classification of Breast Imaging Reporting and Data System (BI-RADS) categories and breast density scores, as characterized by the American College of Radiology (ACR). To tackle this problem, we construct and annotate a breast imaging-based radiological reports dataset and its benchmark results. The dataset was originally in Spanish. Board-certified radiologists collected and annotated it according to the BI-RADS lexicon and categories at the Breast Radiology department, TecSalud Hospitals Monterrey, Mexico. Initially, it was translated into English language using Google Translate. Afterwards, it was preprocessed by removing duplicates and missing values. After preprocessing, the final dataset consists of 5046 unique reports from 5046 patients with an average age of 53 years and 100% women. Furthermore, we used word-level NLP-based embedding techniques, term frequency-inverse document frequency (TF-IDF) and word2vec to extract semantic and syntactic information. We also compared the performance of ML, DL and large language models (LLMs) classifiers for BI-RADS category classification.ResultsThe final breast imaging-based radiological reports dataset contains 5046 unique reports. We compared K-Nearest Neighbour (KNN), Support Vector Machine (SVM), Naive Bayes (NB), Random Forest (RF), Adaptive Boosting (AdaBoost), Gradient-Boosting (GB), Extreme Gradient Boosting (XGB), Long Short-Term Memory (LSTM), Bidirectional Encoder Representations from Transformers (BERT) and Biomedical Generative Pre-trained Transformer (BioGPT) classifiers. It is observed that the BioGPT classifier with preprocessed data performed 6% better with a mean sensitivity of 0.60 (95% confidence interval (CI), 0.391-0.812) compared to the second best performing classifier BERT, which achieved mean sensitivity of 0.54 (95% CI, 0.477-0.607).ConclusionIn this work, we propose a curated and annotated benchmark dataset that can be used for BI-RADS and breast density category classification. We also provide baseline results of most ML, DL and LLMs models for BI-RADS classification that can be used as a starting point for future investigation. The main objective of this investigation is to provide a repository for the investigators who wish to enter the field to push the boundaries further.

引用

页数：10

共 8 条

[1] Transfer Learning and Fine Tuning in Mammogram BI-RADS Classification
Falconi, Lenin G.
Perez, Maria
Aguilar, Wilbert G.
Conci, Aura
2020 IEEE 33RD INTERNATIONAL SYMPOSIUM ON COMPUTER-BASED MEDICAL SYSTEMS(CBMS 2020), 2020, : 475 - 480
[2] Application of MRI Radiomics-Based Machine Learning Model to Improve Contralateral BI-RADS 4 Lesion Assessment
Hao, Wen
Gong, Jing
Wang, Shengping
Zhu, Hui
Zhao, Bin
Peng, Weijun
FRONTIERS IN ONCOLOGY, 2020, 10
[3] A text classification network model combining machine learning and deep learning
Chen, Hao
Zhang, Haifei
Yang, Yuwei
He, Long
INTERNATIONAL JOURNAL OF SENSOR NETWORKS, 2024, 44 (03) : 182 - 192
[4] Supervised Hybrid Model for Rumor Classification: A Comparative Study of Machine and Deep Learning Approaches
Aothoi, Mehzabin Sadat
Ahsan, Samin
Choudhury, Najeefa Nikhat
Rasel, Annajiat Alim
BIG DATA ANALYTICS AND KNOWLEDGE DISCOVERY, DAWAK 2023, 2023, 14148 : 281 - 286
[5] Prediction of molecular subtypes of breast cancer using BI-RADS features based on a "white box" machine learning approach in a multi-modal imaging setting
Wu, Mingxiang
Zhong, Xiaoling
Peng, Quanzhou
Xu, Mei
Huang, Shelei
Yuan, Jialin
Ma, Jie
Tan, Tao
EUROPEAN JOURNAL OF RADIOLOGY, 2019, 114 : 175 - 184
[6] Traditional Machine Learning, Deep Learning, and BERT (Large Language Model) Approaches for Predicting Hospitalizations From Nurse Triage Notes: Comparative Evaluation of Resource Management
Patel, Dhavalkumar
Timsina, Prem
Gorenstein, Larisa
Glicksberg, Benjamin S.
Raut, Ganesh
Cheetirala, Satya Narayan
Santana, Fabio
Tamegue, Jules
Kia, Arash
Zimlichman, Eyal
Levin, Matthew A.
Freeman, Robert
Klang, Eyal
JMIR AI, 2024, 3
[7] Deep-Learning-Based Natural Language Processing of Serial Free-Text Radiological Reports for Predicting Rectal Cancer Patient Survival
Kim, Sunkyu
Lee, Choong-kun
Choi, Yonghwa
Baek, Eun Sil
Choi, Jeong Eun
Lim, Joon Seok
Kang, Jaewoo
Shin, Sang Joon
FRONTIERS IN ONCOLOGY, 2021, 11
[8] A Natural Language Processing and deep learning based model for automated vehicle diagnostics using free-text customer service reports
Khodadadi, Ali
Ghandiparsi, Soroush
Chuah, Chen-Nee
MACHINE LEARNING WITH APPLICATIONS, 2022, 10

← 1 →