AutoIE: An Automated Framework for Information Extraction from Scientific Literature

被引:0
作者
Liu, Yangyang [1 ,2 ]
Li, Shoubin [1 ]
Huang, Kai [3 ]
Wang, Qing [1 ]
机构
[1] Chinese Acad Sci, Inst Software, Beijing, Peoples R China
[2] Univ Auckland, Auckland, New Zealand
[3] Natl Def Univ, Beijing, Peoples R China
来源
KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, PT II, KSEM 2024 | 2024年 / 14885卷
关键词
Information Extraction; Layout Analysis; Scientific Document Analysis;
D O I
10.1007/978-981-97-5495-3_32
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In the rapidly evolving field of scientific research, efficiently extracting key information from the burgeoning volume of scientific papers remains a formidable challenge. This paper introduces an innovative framework designed to automate the extraction of vital data from scientific PDF documents, enabling researchers to discern future research trajectories more readily. AutoIE uniquely integrates four novel components: (1) A multi-semantic feature fusion-based approach for PDF document layout analysis; (2) Advanced functional block recognition in scientific texts; (3) A synergistic technique for extracting and correlating information on molecular sieve synthesis; (4) An online learning paradigm tailored for molecular sieve literature. Our SBERT model achieves high Marco F1 scores of 87.19 and 89.65 on CoNLL04 and ADE datasets. In addition, a practical application of AutoIE in the petrochemical molecular sieve synthesis domain demonstrates its efficacy, evidenced by an impressive 78% accuracy rate. This research paves the way for enhanced data management and interpretation in molecular sieve synthesis. It is a valuable asset for seasoned experts and newcomers in this specialized field.
引用
收藏
页码:424 / 436
页数:13
相关论文
共 25 条
[21]  
Tran T., 2019, arXiv
[22]  
Weipeng Huang, 2019, Natural Language Processing and Chinese Computing. 8th CCF International Conference, NLPCC 2019. Proceedings. Lecture Notes in Artificial Intelligence, Subseries of Lecture Notes in Computer Science (LNAI 11839), P713, DOI 10.1007/978-3-030-32236-6_65
[23]  
White K., 2019, Science and Engineering Indicators 2020. NSB-2020-6
[24]  
Yu BW, 2020, Arxiv, DOI arXiv:1909.04273
[25]  
Zeng XR, 2018, PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL), VOL 1, P506