Hybrid method to automatically extract medical document tree structure

被引:5
作者
Landolsi, Mohamed Yassine [1 ]
Hlaoua, Lobna [1 ]
Ben Romdhane, Lotfi [1 ]
机构
[1] Univ Sousse, MARS Res Lab, SDM Res Grp, ISITCom, Hammam Sousse 4011, Tunisia
关键词
Medical text mining; Section detection; Machine learning; Information extraction; Multimodal features; Electronic medical records; SUPPORT;
D O I
10.1016/j.engappai.2023.105922
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
A huge and rapidly growing quantity of medical documents is available in an electronic versions. These informing documents mostly have textual content in natural language. These facts can make the documents difficult to read, ambiguous, or even contain mistakes. Consequently, when a doctor decides on treatment, many medical critical errors can happen. The information extraction in unstructured documents can handle this problem. In our paper, we introduce an automatic section detection method in the medical field SDM (Section Detection in Medical field) to improve information extraction tasks by providing more context. Accordingly, we have constructed some rules to prepare automatically the training set. Then, we benefit from numerous features such as formatting style, syntactic, and semantic features to train a machine learning model to find titles. Then, a section tree is generated that can be useful for other tasks. As anticipated, our experiments show that merging these features using a Convolutional Neural Network (CNN) can lead to a better result in real medical documents according to the F1-score measure. Thus, we benefit from the layout information and our method can provide the document sections in a tree form. It is worth noting that our method can be easily applied in other fields since it is not strongly dependent on the document type or language.
引用
收藏
页数:15
相关论文
共 46 条
  • [1] Abualigah L., 2022, WATER AIR SOIL POLL, P1
  • [2] Efficient text document clustering approach using multi-search Arithmetic Optimization Algorithm
    Abualigah, Laith
    Almotairi, Khaled H.
    Al-qaness, Mohammed A. A.
    Ewees, Ahmed A.
    Yousri, Dalia
    Abd Elaziz, Mohamed
    Nadimi-Shahraki, Mohammad H.
    [J]. KNOWLEDGE-BASED SYSTEMS, 2022, 248
  • [3] Akbik A, 2019, NAACL HLT 2019: THE 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES: PROCEEDINGS OF THE DEMONSTRATIONS SESSION, P54
  • [4] Automatic Segmentation of Clinical Texts
    Apostolova, Emilia
    Channin, David S.
    Demner-Fushman, Dina
    Furst, Jacob
    Lytinen, Steven
    Raicu, Daniela
    [J]. 2009 ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY, VOLS 1-20, 2009, : 5905 - +
  • [5] Identifying Clinical Terms in Medical Text Using Ontology-Guided Machine Learning
    Arbabi, Aryan
    Adams, David R.
    Fidler, Sanja
    Brudno, Michael
    [J]. JMIR MEDICAL INFORMATICS, 2019, 7 (02) : 191 - 205
  • [6] Beel J, 2010, LECT NOTES COMPUT SC, V6273, P413, DOI 10.1007/978-3-642-15464-5_45
  • [7] Blaser R, 2016, J MACH LEARN RES, V17
  • [8] Improving the Prescription Process Information Support with Structured Medical Prospectuses Using Neural Networks
    Chirila, Oana Sorina
    Chirila, Ciprian Bogdan
    Stoicu-Tivadar, Lacramioara
    [J]. MEDINFO 2019: HEALTH AND WELLBEING E-NETWORKS FOR ALL, 2019, 264 : 353 - 357
  • [9] Recognition and Evaluation of Clinical Section Headings in Clinical Documents Using Token-Based Formulation with Conditional Random Fields
    Dai, Hong-Jie
    Syed-Abdul, Shabbir
    Chen, Chih-Wei
    Wu, Chieh-Chen
    [J]. BIOMED RESEARCH INTERNATIONAL, 2015, 2015
  • [10] Deleger L., 2014, P TALN 2014, V2, P568