A Reference Paper Collection System Using Web Scraping

被引:1
|
作者
Naing, Inzali [1 ]
Aung, Soe Thandar [1 ]
Wai, Khaing Hsu [1 ]
Funabiki, Nobuo [1 ]
机构
[1] Okayama Univ, Dept Informat & Commun Syst, Okayama 7008530, Japan
关键词
web scraping; Google Scholar; data collection; Bert; Selenium; flask framework; Angular;
D O I
10.3390/electronics13142700
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Collecting reference papers from the Internet is one of the most important activities for progressing research and writing papers about their results. Unfortunately, the current process using Google Scholar may not be efficient, since a lot of paper files cannot be accessed directly by the user. Even if they are accessible, their effectiveness needs to be checked manually. In this paper, we propose a reference paper collection system using web scraping to automate paper collections from websites. This system can collect or monitor data from the Internet, which is considered as the environment, using Selenium, a popular web scraping software, as the sensor; this examines the similarity against the search target by comparing the keywords using the Bert model. The Bert model is a deep learning model for natural language processing (NLP) that can understand context by analyzing the relationships between words in a sentence bidirectionally. The Python Flask is adopted at the web application server, where Angular is used for data presentations. For the evaluation, we measured the performance, investigated the accuracy, and asked members of our laboratory to use the proposed method and provide their feedback. Their results confirm the method's effectiveness.
引用
收藏
页数:18
相关论文
共 50 条
  • [41] Evaluate a Personalized Multi Agent System through Social Networks: Web scraping
    Trifa, Amal
    Sbai, Aroua Hedhili
    Chaari, Wided Lejouad
    2017 IEEE 26TH INTERNATIONAL CONFERENCE ON ENABLING TECHNOLOGIES - INFRASTRUCTURE FOR COLLABORATIVE ENTERPRISES (WETICE), 2017, : 18 - 20
  • [42] Usage of Web Scraping in the Pharmaceutical Sector
    Dahiya R.
    Nidhi
    Kumari K.
    Kumari S.
    Agarwal N.
    EAI Endorsed Transactions on Pervasive Health and Technology, 2023, 9 (01)
  • [43] A Web Metric Collection and Reporting System
    Malhotra, Ruchika
    Sharma, Anjali
    PROCEEDING OF THE THIRD INTERNATIONAL SYMPOSIUM ON WOMEN IN COMPUTING AND INFORMATICS (WCI-2015), 2015, : 661 - 667
  • [44] Using Surveys and Web-Scraping to Select Tools for Software Testing Consultancy
    Raulamo-Jurvanen, Paivi
    Kakkonen, Kari
    Mantyla, Mika
    PRODUCT-FOCUSED SOFTWARE PROCESS IMPROVEMENT (PROFES 2016), 2016, 10027 : 285 - 300
  • [45] Generalized Variable Conversion Using K-means Clustering and Web Scraping
    Modarresi, Kourosh
    Munir, Abdurrahman
    COMPUTATIONAL SCIENCE - ICCS 2018, PT II, 2018, 10861 : 247 - 258
  • [46] Social Media Web Scraping using Social Media Developers API and Regex
    Dewi, Lusiana Citra
    Meiliana
    Chandra, Alvin
    4TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND COMPUTATIONAL INTELLIGENCE (ICCSCI 2019) : ENABLING COLLABORATION TO ESCALATE IMPACT OF RESEARCH RESULTS FOR SOCIETY, 2019, 157 : 444 - 449
  • [47] Research Note: Scraping Financial Data from the Web Using the R Language
    Krotov, Vlad
    Tennyson, Matthew
    JOURNAL OF EMERGING TECHNOLOGIES IN ACCOUNTING, 2018, 15 (01) : 169 - 181
  • [48] Extraction of Meaningful Information from Unstructured Clinical Notes Using Web Scraping
    Varshini, K. Sukanya
    Uthra, R. Annie
    JOURNAL OF CIRCUITS SYSTEMS AND COMPUTERS, 2023, 32 (03)
  • [49] Taiwan Stock Tape Reading Periodically Using Web Scraping Technology with GUI
    Lin, Chun-Feng
    Yang, Sheng-Chih
    APPLIED SYSTEM INNOVATION, 2022, 5 (01)
  • [50] Teaching Tip Scaffolding in Business Analytics Education: Using Python for Web Scraping
    Jeyaraj, Anand
    Journal of Information Systems Education, 2024, 35 (04) : 438 - 450