Image Text Extraction and Natural Language Processing of Unstructured Data from Medical Reports

被引:0
作者
Malashin, Ivan [1 ]
Masich, Igor [1 ]
Tynchenko, Vadim [1 ]
Gantimurov, Andrei [1 ]
Nelyub, Vladimir [1 ,2 ]
Borodulin, Aleksei [1 ]
机构
[1] Bauman Moscow State Tech Univ, Artificial Intelligence Technol Sci & Educ Ctr, Moscow 105005, Russia
[2] Far Eastern Fed Univ, Sci Dept, Vladivostok 690922, Russia
来源
MACHINE LEARNING AND KNOWLEDGE EXTRACTION | 2024年 / 6卷 / 02期
关键词
image recognition; natural language processing; named entity recognition; information extraction; CONVOLUTIONAL NEURAL-NETWORKS; INFORMATION EXTRACTION; RECOGNITION; VIDEO;
D O I
10.3390/make6020064
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This study presents an integrated approach for automatically extracting and structuring information from medical reports, captured as scanned documents or photographs, through a combination of image recognition and natural language processing (NLP) techniques like named entity recognition (NER). The primary aim was to develop an adaptive model for efficient text extraction from medical report images. This involved utilizing a genetic algorithm (GA) to fine-tune optical character recognition (OCR) hyperparameters, ensuring maximal text extraction length, followed by NER processing to categorize the extracted information into required entities, adjusting parameters if entities were not correctly extracted based on manual annotations. Despite the diverse formats of medical report images in the dataset, all in Russian, this serves as a conceptual example of information extraction (IE) that can be easily extended to other languages.
引用
收藏
页码:1361 / 1377
页数:17
相关论文
共 50 条
  • [1] Fiscal data in text: Information extraction from audit reports using Natural Language Processing
    Beltran, Alejandro
    DATA & POLICY, 2023, 5
  • [2] Automatic Extraction of Engineering Rules From Unstructured Text: A Natural Language Processing Approach
    Ye, Xinfeng
    Lu, Yuqian
    JOURNAL OF COMPUTING AND INFORMATION SCIENCE IN ENGINEERING, 2020, 20 (03)
  • [3] Anatomic stage extraction from medical reports of breast Cancer patients using natural language processing
    Deshmukh, Pratiksha R.
    Phalnikar, Rashmi
    HEALTH AND TECHNOLOGY, 2020, 10 (06) : 1555 - 1570
  • [4] Anatomic stage extraction from medical reports of breast Cancer patients using natural language processing
    Pratiksha R. Deshmukh
    Rashmi Phalnikar
    Health and Technology, 2020, 10 : 1555 - 1570
  • [5] Using natural language processing to extract structured epilepsy data from unstructured clinic letters: development and validation of the ExECT (extraction of epilepsy clinical text) system
    Fonferko-Shadrach, Beata
    Lacey, Arron S.
    Roberts, Angus
    Akbari, Ashley
    Thompson, Simon
    Ford, David V.
    Lyons, Ronan A.
    Rees, Mark I.
    Pickrell, William Owen
    BMJ OPEN, 2019, 9 (04):
  • [6] Integrated natural language processing method for text mining and visualization of underground engineering text reports
    Shao, Ruiqi
    Lin, Peng
    Xu, Zhenhao
    AUTOMATION IN CONSTRUCTION, 2024, 166
  • [7] A Natural Language Processing Pipeline of Chinese Free-Text Radiology Reports for Liver Cancer Diagnosis
    Liu, Honglei
    Xu, Yan
    Zhang, Zhiqiang
    Wang, Ni
    Huang, Yanqun
    Hu, Yanjun
    Yang, Zhenghan
    Jiang, Rui
    Chen, Hui
    IEEE ACCESS, 2020, 8 : 159110 - 159119
  • [8] MEDSYNDIKATE - a natural language system for the extraction of medical information from findings reports
    Hahn, U
    Romacker, M
    Schulz, S
    INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS, 2002, 67 (1-3) : 63 - 74
  • [9] Automating Stroke Data Extraction From Free-Text Radiology Reports Using Natural Language Processing: Instrument Validation Study
    Yu, Amy Y. X.
    Liu, Zhongyu A.
    Pou-Prom, Chloe
    Lopes, Kaitlyn
    Kapral, Moira K.
    Aviv, Richard, I
    Mamdani, Muhammad
    JMIR MEDICAL INFORMATICS, 2021, 9 (05)
  • [10] Task definition, annotated dataset, and supervised natural language processing models for symptom extraction from unstructured clinical notes
    Steinkamp, Jackson M.
    Bala, Wasif
    Sharma, Abhinav
    Kantrowitz, Jacob J.
    JOURNAL OF BIOMEDICAL INFORMATICS, 2020, 102