Application of Machine Learning and Word Embeddings in the Classification of Cancer Diagnosis Using Patient Anamnesis

被引：21

作者：

Ramos Magna, Andres Alejandro ^{[1
]}

Allende-Cid, Hector ^{[1
]}

Taramasco, Carla ^{[2
]}

Becerra, Carlos ^{[2
]}

Figueroa, Rosa L. ^{[3
]}

机构：

[1] Pontificia Univ Catolica Valparaiso, Escuela Ingn Informat, Valparaiso 2374631, Chile

[2] Univ Valparaiso, Escuela Ingn Civil Informat, Valparaiso 2362905, Chile

[3] Univ Concepcion, Dept Ingn Elect, Concepcion 4070409, Chile

来源：

IEEE ACCESS | 2020年 / 8卷 / 08期

关键词：

History; Medical diagnostic imaging; Breast cancer; Natural language processing; Natural language processing (NLP); machine learning; deep learning; recommendation system; anamnesis; BIDIRECTIONAL LSTM; ICD-9-CM;

D O I：

10.1109/ACCESS.2020.3000075

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Currently, one of the main challenges for information systems in healthcare is focused on support for health professionals regarding disease classifications. This work presents an innovative method for a recommendation system for the diagnosis of breast cancer using patient medical histories. In this proposal, techniques of natural language processing (NLP) were implemented on real datasets: one comprised 160, 560 medical histories of anonymous patients from a hospital in Chile for the following categories: breast cancer, cysts and nodules, other cancer, breast cancer surgeries and other diagnoses; and the other dataset was obtained from the MIMIC III dataset. With the application of word-embedding techniques, such as word2vec's skip-gram and BERT, and machine learning techniques, a recommendation system as a tool to support the physician's decision-making was implemented. The obtained results demonstrate that using word embeddings can define a good-quality recommendation system. The results of 20 experiments with 5-fold cross-validation for anamnesis written in Spanish yielded an F1 of 0.980 +/- 0.0014 on the classification of 'cancer' versus 'not cancer' and 0.986 +/- 0.0014 for 'breast cancer' versus 'other cancer'. Similar results were obtained with the MIMIC III dataset.

引用

页码：106198 / 106213

页数：16

共 50 条

[1] Genre Classification using Word Embeddings and Deep Learning
Kumar, Akshi
Rajpal, Arjun
Rathore, Dushyant
2018 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2018, : 2142 - 2146
[2] Text Classification Using Word Embeddings
Helaskar, Mukund N.
Sonawane, Sheetal S.
2019 5TH INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION, CONTROL AND AUTOMATION (ICCUBEA), 2019,
[3] E-mail classification with machine learning and word embeddings for improved customer support
Anton Borg
Martin Boldt
Oliver Rosander
Jim Ahlstrand
Neural Computing and Applications, 2021, 33 : 1881 - 1902
[4] E-mail classification with machine learning and word embeddings for improved customer support
Borg, Anton
Boldt, Martin
Rosander, Oliver
Ahlstrand, Jim
NEURAL COMPUTING & APPLICATIONS, 2021, 33 (06): : 1881 - 1902
[5] Using word embeddings in Twitter election classification
Xiao Yang
Craig Macdonald
Iadh Ounis
Information Retrieval Journal, 2018, 21 : 183 - 207
[6] Debate Stance Classification Using Word Embeddings
Konjengbam, Anand
Ghosh, Subrata
Kumar, Nagendra
Singh, Manish
BIG DATA ANALYTICS AND KNOWLEDGE DISCOVERY (DAWAK 2018), 2018, 11031 : 382 - 395
[7] Machine learning for financial transaction classification across companies using character-level word embeddings of text fields
Jorgensen, Rasmus Kaer
Igel, Christian
INTELLIGENT SYSTEMS IN ACCOUNTING FINANCE & MANAGEMENT, 2021, 28 (03): : 159 - 172
[8] Using word embeddings in Twitter election classification
Yang, Xiao
Macdonald, Craig
Ounis, Iadh
INFORMATION RETRIEVAL JOURNAL, 2018, 21 (2-3): : 183 - 207
[9] Applied machine learning in cancer research: A systematic review for patient diagnosis, classification and prognosis
Kourou, Konstantina
Exarchos, Konstantinos P.
Papaloukas, Costas
Sakaloglou, Prodromos
Exarchos, Themis
Fotiadis, Dimitrios I.
COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL, 2021, 19 : 5546 - 5555
[10] Bloom's Learning Outcomes' Automatic Classification Using LSTM and Pretrained Word Embeddings
Shaikh, Sarang
Daudpotta, Sher Muhammad
Imran, Ali Shariq
IEEE ACCESS, 2021, 9 (09): : 117887 - 117909

← 1 2 3 4 5 →