EMACrawler: web search engine database freshness optimization

被引:0
作者
Alanoglu, Zuelfue [1 ]
Akcayol, M. Ali [2 ]
机构
[1] Hatay Mustafa Kemal Univ, Antakya Meslek Yuksek Okulu, Bilgisayar Teknolojileri Bolumu, Antakya, Turkiye
[2] Gazi Univ, Muhendislik Fak, Bilgisayar Muhendisligi Bolumu, Ankara, Turkiye
来源
JOURNAL OF POLYTECHNIC-POLITEKNIK DERGISI | 2024年 / 27卷 / 06期
关键词
Web crawler; update module; data collection; data indexing;
D O I
10.2339/politeknik.1347054
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
In today's information and technology age, search engines have become an important part of our lives. However, search engines are the first to be used to access information, old and unnecessary information is included in the content offered to users. Regarding providing up-to-date data, today's search engines often cannot offer the desired success. In order to keep the data presented by web browsers up-to-date, the time of return visits must be accurately estimated. In this study, EMACrawler based on exponential moving averages is proposed to determine the revisit times, which is the most important feature that affects the performance of search engines. The proposed method is tested using precision, total coverage, and efficiency metrics. It has been seen that EMACrawler obtains the current data on the web pages accurately and quickly. As a result of the experimental studies, it has been seen that EMACrawler is more successful than other methods in obtaining up-to-date data and maintaining the freshness of the browser database.
引用
收藏
页数:16
相关论文
共 36 条
[21]   Search Prevention with Captcha against Web Indexing: A Proof of Concept [J].
Sample, Luke ;
Kim, Donghoon .
2019 22ND IEEE INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND ENGINEERING (IEEE CSE 2019) AND 17TH IEEE INTERNATIONAL CONFERENCE ON EMBEDDED AND UBIQUITOUS COMPUTING (IEEE EUC 2019), 2019, :219-224
[22]   An implementation of ABLE-based distributed Web search techniques [J].
Zhai Dongsheng ;
Li Li ;
Liu Zhe .
PROCEEDINGS OF THE 4TH INTERNATIONAL CONFERENCE ON INNOVATION & MANAGEMENT, VOLS I AND II, 2007, :1837-1842
[23]   Research and Realization of News Gathering and Editing System Based on Distributed Search Engine [J].
Han, Yamin ;
Liu, Kun ;
Ma, Kun .
ADVANCES IN INTELLIGENT NETWORKING AND COLLABORATIVE SYSTEMS, INCOS-2017, 2018, 8 :349-354
[24]   Web Page Indexing through Page Ranking for Effective Semantic Search [J].
Sharma, Robin ;
Kandpal, Ankita ;
Bhakuni, Priyanka ;
Chauhan, Rashmi ;
Goudar, R. H. ;
Tyagi, Asit .
7TH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS AND CONTROL (ISCO 2013), 2013, :389-392
[25]   Collecting Representative Social Media Samples from a Search Engine by Adaptive Query Generation [J].
Landeiro, Virgile ;
Culotta, Aron .
PROCEEDINGS OF THE 2019 IEEE/ACM INTERNATIONAL CONFERENCE ON ADVANCES IN SOCIAL NETWORKS ANALYSIS AND MINING (ASONAM 2019), 2019, :204-207
[26]   HCI Browser: A Tool for Administration and Data Collection for Studies of Web Search Behaviors [J].
Capra, Robert .
DESIGN, USER EXPERIENCE, AND USABILITY: THEORY, METHODS, TOOLS AND PRACTICE, PT 2, 2011, 6770 :259-268
[27]   Optimization of WEB Data Collection Technology Based on the HITS algorithm [J].
Mei, Desheng ;
Li, Weibo ;
He, Pin .
2013 INTERNATIONAL CONFERENCE ON COMPUTER SCIENCES AND APPLICATIONS (CSA), 2013, :119-122
[28]   Text Mining: Design of Interactive Search Engine Based Regular Expressions of Online Automobile Advertisements [J].
Jalal, Ahmed Adeeb .
INTERNATIONAL JOURNAL OF ENGINEERING PEDAGOGY, 2020, 10 (03) :35-48
[29]   A Study on Competent Crawling Algorithm (CCA) for Web Search to Enhance Efficiency of Information Retrieval [J].
Saranya, S. ;
Zoraida, B. S. E. ;
Paul, P. Victor .
ARTIFICIAL INTELLIGENCE AND EVOLUTIONARY ALGORITHMS IN ENGINEERING SYSTEMS, VOL 2, 2015, 325 :9-16
[30]   Analysis and Design of Public Opinion Pre-warning Analysis Platform based on Vertical Search Engine [J].
Liu, Kun ;
Ma, Kun ;
Yue, Zonglin .
2017 IEEE 14TH INTERNATIONAL CONFERENCE ON E-BUSINESS ENGINEERING (ICEBE 2017), 2017, :288-292