LitAI: Enhancing Multimodal Literature Understanding and Mining with Generative AI

被引:0
作者
Medisetti, Gowtham [1 ]
Compson, Zacchaeus [1 ]
Fan, Heng [1 ]
Yang, Huaxiao [1 ]
Feng, Yunhe [1 ]
机构
[1] Univ North Texas, Denton, TX 76205 USA
来源
2024 IEEE 7TH INTERNATIONAL CONFERENCE ON MULTIMEDIA INFORMATION PROCESSING AND RETRIEVAL, MIPR 2024 | 2024年
关键词
Literature Mining; OCR; Generative AI; Prompt Engineering; ChatGPT; GPT-4;
D O I
10.1109/MIPR62202.2024.00080
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Information processing and retrieval in literature are critical for advancing scientific research and knowledge discovery. The inherent multimodality and diverse literature formats, including text, tables, and figures, present significant challenges in literature information retrieval. This paper introduces LitAI, a novel approach that employs readily available generative AI tools to enhance multimodal information retrieval from literature documents. By integrating tools such as optical character recognition (OCR) with generative AI services, LitAI facilitates the retrieval of text, tables, and figures from PDF documents. We have developed specific prompts that leverage in-context learning and prompt engineering within Generative AI to achieve precise information extraction. Our empirical evaluations, conducted on datasets from the ecological and biological sciences, demonstrate the superiority of our approach over several established baselines including Tesseract-OCR and GPT-4. The implementation of LitAI is accessible at https://github.com/ResponsibleAILab/LitAI.
引用
收藏
页码:471 / 476
页数:6
相关论文
共 16 条
[1]  
Esposito F., P 3 INT C DOC AN REC, V1, P466
[2]  
Hassan T, 2007, PROC INT CONF DOC, P1143
[3]   From Detection to Application: Recent Advances in Understanding Scientific Tables and Figures [J].
Huang, Jiani ;
Chen, Haihua ;
Yu, Fengchang ;
Lu, Wei .
ACM COMPUTING SURVEYS, 2024, 56 (10)
[4]  
Koning B., 2022, Extracting sections from pdf-formatted cti reports
[5]  
Kusner MJ, 2015, PR MACH LEARN RES, V37, P957
[6]  
Li JN, 2022, PR MACH LEARN RES
[7]  
Lo Kyle, 2023, PaperMage: A Unifted Toolkit for Processing, Representing, and Manipulating Visually-Rich Scientific Documents, P495
[8]   An Automatic System for Extracting Figures and Captions in Biomedical PDF Documents [J].
Lopez, Luis D. ;
Yu, Jingyi ;
Arighi, Cecilia N. ;
Huang, Hongzhan ;
Shatkay, Hagit ;
Wu, Cathy .
2011 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM 2011), 2011, :578-581
[9]  
Muhlberger G., 2014, P 1 INT C DIG ACC TE, P53
[10]  
Oro Ermelinda, 2009, 2009 10th International Conference on Document Analysis and Recognition (ICDAR), P906, DOI 10.1109/ICDAR.2009.12