A Highly Accurate PDF-To-Text Conversion System for Academic Papers Using Natural Language Processing Approach

被引:0
作者
Yong, Tien Fui [1 ]
Azad, Saiful [2 ,3 ]
Rahman, Mohammed Mostafizur [4 ]
Zamli, Kamal Z. [2 ,3 ]
Rabby, Gollam [2 ]
机构
[1] Univ Tunku Abdul Rahman, Fac Informat & Commun Technol, Kampar 31900, Perak, Malaysia
[2] Univ Malaysia Pahang, Fac Comp Syst & Software Engn, Gambang 26300, Pahang, Malaysia
[3] UMP, IBM Ctr Excellence, Gambang, Malaysia
[4] Amer Int Univ Bangladesh, Dhaka, Bangladesh
关键词
PDF-To-Text Conversion; Natural Language Processing; Edit Distance;
D O I
10.1166/asl.2018.13029
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Extracting text out of PDF documents is never an easy task when a higher degree of accuracy and consistency are the two main criteria to be attained. Although, there exist a considerable number of such systems; however, most of them are falling short of offering desirable performance especially when academic literature is the concern. Researches, those involved heavily in text mining and project analyzing, need an accurate and consistent supporting tool for PDF-To-Text (PTT) conversion. Therefore, in this paper, we propose a Natural Language Processing based PDF-to-text (NLPDF) conversion system, which comprises of two major steps, namely (i) reads contents from the PDF and (ii) reconstruct the text. The performance of the proposed system is evaluated via four metrics, namely Precision, Recall, F-Measure (AF), and standard deviation, and compared with eight other similar benchmarked systems available in the market. The experimental results evidently demonstrate the effectiveness of the proposed system.
引用
收藏
页码:7844 / 7849
页数:6
相关论文
共 50 条
  • [31] An Approach to Cluster Scenarios According to Their Similarity Using Natural Language Processing
    Delle Ville, Juliana
    Torres, Diego
    Fernandez, Alejandro
    Antonelli, Leandro
    HUMAN-COMPUTER INTERACTION, HCI-COLLAB 2023, 2024, 1877 : 50 - 62
  • [32] System for Monitoring Natural Disasters using Natural Language Processing in the Social Network Twitter
    Maldonado, Miguel
    Alulema, Darwin
    Morocho, Derlin
    Proano, Mariela
    2016 IEEE INTERNATIONAL CARNAHAN CONFERENCE ON SECURITY TECHNOLOGY (ICCST), 2016, : 79 - 84
  • [33] Automated system for construction specification review using natural language processing
    Moon, Seonghyeon
    Lee, Gitaek
    Chi, Seokho
    ADVANCED ENGINEERING INFORMATICS, 2022, 51
  • [34] IQS- Intelligent Querying System using Natural Language Processing
    Gupta, Prashant
    Goswami, Aman
    Koul, Sahil
    Sartape, Kashinath
    2017 INTERNATIONAL CONFERENCE OF ELECTRONICS, COMMUNICATION AND AEROSPACE TECHNOLOGY (ICECA), VOL 2, 2017, : 410 - 413
  • [35] Natural Language Processing using Kepler Workflow System: First Steps
    Goyal, Ankit
    Singh, Alok
    Bhargava, Shitij
    Crawl, Daniel
    Altintas, Ilkay
    Hsu, Chun-Nan
    INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE 2016 (ICCS 2016), 2016, 80 : 712 - 721
  • [36] Using rule-based natural language processing to improve disease normalization in biomedical text
    Kang, Ning
    Singh, Bharat
    Afzal, Zubair
    van Mulligen, Erik M.
    Kors, Jan A.
    JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2013, 20 (05) : 876 - 881
  • [37] Fiscal data in text: Information extraction from audit reports using Natural Language Processing
    Beltran, Alejandro
    DATA & POLICY, 2023, 5
  • [38] Enhancing extractive text summarization using natural language processing with an optimal deep learning model
    Hassan, Abdulkhaleq Q. A.
    Al-onazi, Badriyya B.
    Maashi, Mashael
    Darem, Abdulbasit A.
    Abunadi, Ibrahim
    Mahmud, Ahmed
    AIMS MATHEMATICS, 2024, 9 (05): : 12588 - 12609
  • [39] Analyzing and Visualizing Text Information in Corporate Sustainability Reports Using Natural Language Processing Methods
    Kang, Hyewon
    Kim, Jinho
    APPLIED SCIENCES-BASEL, 2022, 12 (11):
  • [40] A decision support system for agriculture using natural language processing (ADSS)
    Prasad, J. R.
    Prasad, R. S.
    Kulkarni, U. V.
    IMECS 2008: INTERNATIONAL MULTICONFERENCE OF ENGINEERS AND COMPUTER SCIENTISTS, VOLS I AND II, 2008, : 365 - +