Effective Machine Learning Techniques for Non-English Radiology Report Classification: A Danish Case Study

被引：0

作者：

Schiavone, Alice ^{[1
]}

Pehrson, Lea Marie ^{[1
,2
,3
]}

Ingala, Silvia ^{[2
,4
]}

Bonnevie, Rasmus ^{[5
]}

Fraccaro, Marco ^{[5
]}

Li, Dana ^{[2
,3
]}

Nielsen, Michael Bachmann ^{[1
,2
,3
]}

Elliott, Desmond ^{[1
]}

机构：

[1] Univ Copenhagen, Dept Comp Sci, DK-2100 Copenhagen, Denmark

[2] Copenhagen Univ Hosp, Rigshospitalet, Dept Diagnost Radiol, DK-2100 Copenhagen, Denmark

[3] Univ Copenhagen, Dept Clin Med, DK-2100 Copenhagen, Denmark

[4] Cerebriu AS, DK-1434 Copenhagen, Denmark

[5] Unumed Aps, DK-1055 Copenhagen, Denmark

来源：

AI | 2025年 / 6卷 / 02期

关键词：

AI for healthcare; natural language processing; radiology report classification;

D O I：

10.3390/ai6020037

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Background: Machine learning methods for clinical assistance require a large number of annotations from trained experts to achieve optimal performance. Previous work in natural language processing has shown that it is possible to automatically extract annotations from the free-text reports associated with chest X-rays. Methods: This study investigated techniques to extract 49 labels in a hierarchical tree structure from chest X-ray reports written in Danish. The labels were extracted from approximately 550,000 reports by performing multi-class, multi-label classification using a method based on pattern-matching rules, a classic approach in the literature for solving this task. The performance of this method was compared to that of open-source large language models that were pre-trained on Danish data and fine-tuned for classification. Results: Methods developed for English were also applicable to Danish and achieved similar performance (a weighted F1 score of 0.778 on 49 findings). A small set of expert annotations was sufficient to achieve competitive results, even with an unbalanced dataset. Conclusions: Natural language processing techniques provide a promising alternative to human expert annotation when annotations of chest X-ray reports are needed. Large language models can outperform traditional pattern-matching methods.

引用

页数：19

共 38 条

[1]

BotXO CertainlyIO, 2020, Danish BERT

[2] PadChest: A large chest x-ray image dataset with multi-label annotated reports [J].

Bustos, Aurelia ;

Pertusa, Antonio ;

Salinas, Jose-Maria ;

de la Iglesia-Vaya, Maria .

MEDICAL IMAGE ANALYSIS, 2020, 66

[3] A simple algorithm for identifying negated findings and diseases in discharge summaries [J].

Chapman, WW ;

Bridewell, W ;

Hanbury, P ;

Cooper, GF ;

Buchanan, BG .

JOURNAL OF BIOMEDICAL INFORMATICS, 2001, 34 (05) :301-310

[4]

Chen HM, 2019, PR MACH LEARN RES, V102, P109

[5] Automated labelling of radiology reports using natural language processing: Comparison of traditional and newer methods [J].

Chng, Seo Yi ;

Tern, Paul J. W. ;

Kan, Matthew R. X. ;

Cheng, Lionel T. E. .

HEALTH CARE SCIENCE, 2023, 2 (02) :120-128

[6]

Conneau A, 2020, Arxiv, DOI [arXiv:1911.02116, DOI 10.48550/ARXIV.1911.02116]

[7]

Dai Xiang, 2022, FINDINGS ASS COMPUTA, P7212

[8] Why Is Multiclass Classification Hard? [J].

Del Moral, Pablo ;

Nowaczyk, Slawomir ;

Pashami, Sepideh .

IEEE ACCESS, 2022, 10 :80448-80462

[9]

Devlin J, 2019, Arxiv, DOI [arXiv:1810.04805, 10.48550/arXiv.1810.04805]

[10] Ensemble methods in machine learning [J].

Dietterich, TG .

MULTIPLE CLASSIFIER SYSTEMS, 2000, 1857 :1-15

← 1 2 3 4 →