AutoIE: An Automated Framework for Information Extraction from Scientific Literature

被引:0
作者
Liu, Yangyang [1 ,2 ]
Li, Shoubin [1 ]
Huang, Kai [3 ]
Wang, Qing [1 ]
机构
[1] Chinese Acad Sci, Inst Software, Beijing, Peoples R China
[2] Univ Auckland, Auckland, New Zealand
[3] Natl Def Univ, Beijing, Peoples R China
来源
KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, PT II, KSEM 2024 | 2024年 / 14885卷
关键词
Information Extraction; Layout Analysis; Scientific Document Analysis;
D O I
10.1007/978-981-97-5495-3_32
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In the rapidly evolving field of scientific research, efficiently extracting key information from the burgeoning volume of scientific papers remains a formidable challenge. This paper introduces an innovative framework designed to automate the extraction of vital data from scientific PDF documents, enabling researchers to discern future research trajectories more readily. AutoIE uniquely integrates four novel components: (1) A multi-semantic feature fusion-based approach for PDF document layout analysis; (2) Advanced functional block recognition in scientific texts; (3) A synergistic technique for extracting and correlating information on molecular sieve synthesis; (4) An online learning paradigm tailored for molecular sieve literature. Our SBERT model achieves high Marco F1 scores of 87.19 and 89.65 on CoNLL04 and ADE datasets. In addition, a practical application of AutoIE in the petrochemical molecular sieve synthesis domain demonstrates its efficacy, evidenced by an impressive 78% accuracy rate. This research paves the way for enhanced data management and interpretation in molecular sieve synthesis. It is a valuable asset for seasoned experts and newcomers in this specialized field.
引用
收藏
页码:424 / 436
页数:13
相关论文
共 25 条
  • [1] Arif S, 2018, 2018 INTERNATIONAL CONFERENCE ON DIGITAL IMAGE COMPUTING: TECHNIQUES AND APPLICATIONS (DICTA), P245
  • [2] Bekoulis G, 2018, 2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), P2830
  • [3] Joint entity recognition and relation extraction as a multi-head selection problem
    Bekoulis, Giannis
    Deleu, Johannes
    Demeester, Thomas
    Develder, Chris
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2018, 114 : 34 - 45
  • [4] Dai JF, 2016, ADV NEUR IN, V29
  • [5] Dat Quoc Nguyen, 2019, Advances in Information Retrieval. 41st European Conference on IR Research, ECIR 2019. Proceedings: Lecture Notes in Computer Science (LNCS 11437), P729, DOI 10.1007/978-3-030-15712-8_47
  • [6] Eberts M., 2019, arXiv, DOI DOI 10.3233/FAIA200321
  • [7] Table Detection using Deep Learning
    Gilani, Azka
    Qasim, Shah Rukh
    Malik, Imran
    Shafait, Faisal
    [J]. 2017 14TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), VOL 1, 2017, : 771 - 776
  • [8] Gupta P., 2016, COLING, P2537
  • [9] Development of a benchmark corpus to support the automatic extraction of drug-related adverse effects from medical case reports
    Gurulingappa, Harsha
    Rajput, Abdul Mateen
    Roberts, Angus
    Fluck, Juliane
    Hofmann-Apitius, Martin
    Toldo, Luca
    [J]. JOURNAL OF BIOMEDICAL INFORMATICS, 2012, 45 (05) : 885 - 892
  • [10] Integrating Text Embedding with Traditional NLP Features for Clinical Relation Extraction
    Hasan, Fatema
    Roy, Arpita
    Pan, Shimei
    [J]. 2020 IEEE 32ND INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI), 2020, : 418 - 425