TripleMIE: Multi-modal and Multi Architecture Information Extraction

被引:0
作者
Xia, Boqian [1 ]
Ma, Shihan [1 ]
Li, Yadong [1 ]
Huang, Wenkang [1 ]
Shi, Qiuhui [1 ]
Huang, Zuming [1 ]
Xie, Lele [1 ]
Wang, Hongbin [1 ]
机构
[1] AntGroup, Shanghai 200001, Peoples R China
来源
HEALTH INFORMATION PROCESSING. EVALUATION TRACK PAPERS | 2023年 / 1773卷
关键词
CHIP; 2022; OCR identification; TripleMIE; TEXT;
D O I
10.1007/978-981-99-4826-0_14
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The continuous development of deep learning technology makes it widely used in various fields. In the medical scene, electronic voucher recognition is a very challenging task. Compared with traditional manual entry, the application of OCR and NLP technology can effectively improve work efficiency and reduce the training cost of business personnel. Using OCR and NLP technology to digitize and structure the information on these paper materials has gradually become a hot spot in the current industry. Evaluation task 4 (OCR identification of electronic medical paper documents (ePaper)) of CHIP2022 [15,16,25] requires extracte 87 fields from the four types of medical voucher materials, including discharge summary, outpatient invoice, drug purchase invoice, and inpatient invoice. This task is very challenging because of the various types of materials, noise-contained data, and many categories of target fields. To achieve the above goals, we propose a knowledge-based multimodal and multi-architecture medical voucher information extraction method, namely TripleMIE, which includes I2SM: Image to sequence model, L-SPN: Large scale PLM-based span prediction net, MMIE: multi-modal information extraction model, etc. At the same time, a knowledge-based model integration module named KME is proposed to effectively integrate prior knowledge such as competition rules and material types with the model results. With the help of the above modules, we have achieved excellent results on the online official test data, which verifies the performance of the proposed method.(https://tianchi.aliyun.com/dataset/131815#4)
引用
收藏
页码:143 / 153
页数:11
相关论文
共 25 条
[1]   ICDAR2017 Competition on Post-OCR Text Correction [J].
Chiron, Guillaume ;
Doucet, Antoine ;
Coustaty, Mickael ;
Moreux, Jean-Philippe .
2017 14TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), VOL 1, 2017, :1423-1428
[2]   Extracting information from the text of electronic medical records to improve case detection: a systematic review [J].
Ford, Elizabeth ;
Carroll, John A. ;
Smith, Helen E. ;
Scott, Donia ;
Cassell, Jackie A. .
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2016, 23 (05) :1007-1015
[3]   XYLayoutLM: Towards Layout-Aware Multimodal Networks For Visually-Rich Document Understanding [J].
Gu, Zhangxuan ;
Meng, Changhua ;
Wang, Ke ;
Lan, Jun ;
Wang, Weiqiang ;
Gu, Ming ;
Zhang, Liqing .
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, :4573-4582
[4]  
Guo Z, 2019, IEEE T RADIAT PLASMA, V3, P162, DOI [10.1109/TRPMS.2018.2890359, 10.1109/trpms.2018.2890359]
[5]   Extraction of potential adverse drug events from medical case reports [J].
Gurulingappa, Harsha ;
Mateen-Rajput, Abdul ;
Toldo, Luca .
JOURNAL OF BIOMEDICAL SEMANTICS, 2012, 3
[6]  
Hahn Udo, 2020, Yearb Med Inform, V29, P208, DOI 10.1055/s-0040-1702001
[7]  
Hallett Catalina, 2008, 13th International Conference on Intelligent User Interfaces. IUI 2008, P80, DOI 10.1145/1378773.1378785
[8]   LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking [J].
Huang, Yupan ;
Lv, Tengchao ;
Cui, Lei ;
Lu, Yutong ;
Wei, Furu .
PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, :4083-4091
[9]   OCR-Free Document Understanding Transformer [J].
Kim, Geewook ;
Hong, Teakgyu ;
Yim, Moonbin ;
Nam, JeongYeon ;
Park, Jinyoung ;
Yim, Jinyeong ;
Hwang, Wonseok ;
Yun, Sangdoo ;
Han, Dongyoon ;
Park, Seunghyun .
COMPUTER VISION - ECCV 2022, PT XXVIII, 2022, 13688 :498-517
[10]  
Lewis M, 2019, Arxiv, DOI arXiv:1910.13461