A Highly Accurate PDF-To-Text Conversion System for Academic Papers Using Natural Language Processing Approach

被引:0
作者
Yong, Tien Fui [1 ]
Azad, Saiful [2 ,3 ]
Rahman, Mohammed Mostafizur [4 ]
Zamli, Kamal Z. [2 ,3 ]
Rabby, Gollam [2 ]
机构
[1] Univ Tunku Abdul Rahman, Fac Informat & Commun Technol, Kampar 31900, Perak, Malaysia
[2] Univ Malaysia Pahang, Fac Comp Syst & Software Engn, Gambang 26300, Pahang, Malaysia
[3] UMP, IBM Ctr Excellence, Gambang, Malaysia
[4] Amer Int Univ Bangladesh, Dhaka, Bangladesh
关键词
PDF-To-Text Conversion; Natural Language Processing; Edit Distance;
D O I
10.1166/asl.2018.13029
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Extracting text out of PDF documents is never an easy task when a higher degree of accuracy and consistency are the two main criteria to be attained. Although, there exist a considerable number of such systems; however, most of them are falling short of offering desirable performance especially when academic literature is the concern. Researches, those involved heavily in text mining and project analyzing, need an accurate and consistent supporting tool for PDF-To-Text (PTT) conversion. Therefore, in this paper, we propose a Natural Language Processing based PDF-to-text (NLPDF) conversion system, which comprises of two major steps, namely (i) reads contents from the PDF and (ii) reconstruct the text. The performance of the proposed system is evaluated via four metrics, namely Precision, Recall, F-Measure (AF), and standard deviation, and compared with eight other similar benchmarked systems available in the market. The experimental results evidently demonstrate the effectiveness of the proposed system.
引用
收藏
页码:7844 / 7849
页数:6
相关论文
共 50 条
  • [1] ACADEMIC TEXT CLUSTERING USING NATURAL LANGUAGE PROCESSING
    Taskiran, Salimkan Fatma
    Kaya, Ersin
    KONYA JOURNAL OF ENGINEERING SCIENCES, 2022, 10 : 41 - 51
  • [2] Neurolinguistic approach to natural language processing with applications to medical text analysis
    Duch, Wlodzisfaw
    Matykiewicz, Pawel
    Pestian, John
    NEURAL NETWORKS, 2008, 21 (10) : 1500 - 1510
  • [3] Assessing academic language in tenth grade essays using natural language processing
    Potter, Andrew
    Shortt, Mitchell
    Goldshtein, Maria
    Roscoe, Rod D.
    ASSESSING WRITING, 2025, 64
  • [4] A "catchy" copy and concept evaluation system using a natural language processing approach
    Ikeda, S
    Kaneda, S
    6TH WORLD MULTICONFERENCE ON SYSTEMICS, CYBERNETICS AND INFORMATICS, VOL XVI, PROCEEDINGS: COMPUTER SCIENCE III, 2002, : 267 - 272
  • [5] Plagiarism Detection System for Indonesia Text Based Document by Fingerprint Method and Natural Language Processing Approach
    Winarti, Titin
    Kerami, Djati
    Etp, Lussiana
    Sekarwati, Kemal Ade
    ADVANCED SCIENCE LETTERS, 2016, 22 (10) : 3128 - 3131
  • [6] A scoping review of empathy recognition in text using natural language processing
    Shetty, Vishal Anand
    Durbin, Shauna
    Weyrich, Meghan S.
    Martinez, Airin Denise
    Qian, Jing
    Chin, David L.
    JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2024, 31 (03) : 762 - 775
  • [7] Using Natural Language Processing for Aftermarket Text to Increase Accuracy and Efficiency
    Hollingshead, Derek
    Parendo, Carol
    Peter, Priya
    2022 68TH ANNUAL RELIABILITY AND MAINTAINABILITY SYMPOSIUM (RAMS 2022), 2022,
  • [8] Natural Language Processing in Mixed-methods Text Analysis: A Workflow Approach
    Parks, Louisa
    Peters, Wim
    INTERNATIONAL JOURNAL OF SOCIAL RESEARCH METHODOLOGY, 2023, 26 (04) : 377 - 389
  • [9] Automated Grading System using Natural Language Processing
    Rokade, Amit
    Patil, Bhushan
    Rajani, Sana
    Revandkar, Surabhi
    Shedge, Rajashree
    PROCEEDINGS OF THE 2018 SECOND INTERNATIONAL CONFERENCE ON INVENTIVE COMMUNICATION AND COMPUTATIONAL TECHNOLOGIES (ICICCT), 2018, : 1123 - 1127
  • [10] Second language learning system on the WWW using natural language processing
    Dansuwan, S
    Nishina, K
    Akahori, K
    PROCEEDINGS OF ICCE'98, VOL 1 - GLOBAL EDUCATION ON THE NET, 1998, : 599 - 605