Application of Machine Learning and Word Embeddings in the Classification of Cancer Diagnosis Using Patient Anamnesis

被引:21
|
作者
Ramos Magna, Andres Alejandro [1 ]
Allende-Cid, Hector [1 ]
Taramasco, Carla [2 ]
Becerra, Carlos [2 ]
Figueroa, Rosa L. [3 ]
机构
[1] Pontificia Univ Catolica Valparaiso, Escuela Ingn Informat, Valparaiso 2374631, Chile
[2] Univ Valparaiso, Escuela Ingn Civil Informat, Valparaiso 2362905, Chile
[3] Univ Concepcion, Dept Ingn Elect, Concepcion 4070409, Chile
来源
IEEE ACCESS | 2020年 / 8卷 / 08期
关键词
History; Medical diagnostic imaging; Breast cancer; Natural language processing; Natural language processing (NLP); machine learning; deep learning; recommendation system; anamnesis; BIDIRECTIONAL LSTM; ICD-9-CM;
D O I
10.1109/ACCESS.2020.3000075
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Currently, one of the main challenges for information systems in healthcare is focused on support for health professionals regarding disease classifications. This work presents an innovative method for a recommendation system for the diagnosis of breast cancer using patient medical histories. In this proposal, techniques of natural language processing (NLP) were implemented on real datasets: one comprised 160, 560 medical histories of anonymous patients from a hospital in Chile for the following categories: breast cancer, cysts and nodules, other cancer, breast cancer surgeries and other diagnoses; and the other dataset was obtained from the MIMIC III dataset. With the application of word-embedding techniques, such as word2vec's skip-gram and BERT, and machine learning techniques, a recommendation system as a tool to support the physician's decision-making was implemented. The obtained results demonstrate that using word embeddings can define a good-quality recommendation system. The results of 20 experiments with 5-fold cross-validation for anamnesis written in Spanish yielded an F1 of 0.980 +/- 0.0014 on the classification of 'cancer' versus 'not cancer' and 0.986 +/- 0.0014 for 'breast cancer' versus 'other cancer'. Similar results were obtained with the MIMIC III dataset.
引用
收藏
页码:106198 / 106213
页数:16
相关论文
共 50 条
  • [1] Genre Classification using Word Embeddings and Deep Learning
    Kumar, Akshi
    Rajpal, Arjun
    Rathore, Dushyant
    2018 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2018, : 2142 - 2146
  • [2] Text Classification Using Word Embeddings
    Helaskar, Mukund N.
    Sonawane, Sheetal S.
    2019 5TH INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION, CONTROL AND AUTOMATION (ICCUBEA), 2019,
  • [3] E-mail classification with machine learning and word embeddings for improved customer support
    Anton Borg
    Martin Boldt
    Oliver Rosander
    Jim Ahlstrand
    Neural Computing and Applications, 2021, 33 : 1881 - 1902
  • [4] E-mail classification with machine learning and word embeddings for improved customer support
    Borg, Anton
    Boldt, Martin
    Rosander, Oliver
    Ahlstrand, Jim
    NEURAL COMPUTING & APPLICATIONS, 2021, 33 (06): : 1881 - 1902
  • [5] Using word embeddings in Twitter election classification
    Xiao Yang
    Craig Macdonald
    Iadh Ounis
    Information Retrieval Journal, 2018, 21 : 183 - 207
  • [6] Debate Stance Classification Using Word Embeddings
    Konjengbam, Anand
    Ghosh, Subrata
    Kumar, Nagendra
    Singh, Manish
    BIG DATA ANALYTICS AND KNOWLEDGE DISCOVERY (DAWAK 2018), 2018, 11031 : 382 - 395
  • [7] Machine learning for financial transaction classification across companies using character-level word embeddings of text fields
    Jorgensen, Rasmus Kaer
    Igel, Christian
    INTELLIGENT SYSTEMS IN ACCOUNTING FINANCE & MANAGEMENT, 2021, 28 (03): : 159 - 172
  • [8] Using word embeddings in Twitter election classification
    Yang, Xiao
    Macdonald, Craig
    Ounis, Iadh
    INFORMATION RETRIEVAL JOURNAL, 2018, 21 (2-3): : 183 - 207
  • [9] Applied machine learning in cancer research: A systematic review for patient diagnosis, classification and prognosis
    Kourou, Konstantina
    Exarchos, Konstantinos P.
    Papaloukas, Costas
    Sakaloglou, Prodromos
    Exarchos, Themis
    Fotiadis, Dimitrios I.
    COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL, 2021, 19 : 5546 - 5555
  • [10] Bloom's Learning Outcomes' Automatic Classification Using LSTM and Pretrained Word Embeddings
    Shaikh, Sarang
    Daudpotta, Sher Muhammad
    Imran, Ali Shariq
    IEEE ACCESS, 2021, 9 (09): : 117887 - 117909