An algorithm for multi-domain website classification

被引:1
作者
Ullah M.A. [1 ]
Tahrin A. [1 ]
Marjan S. [1 ]
机构
[1] International Islamic University, Chittagong
关键词
Classification; Dictionary; Dynamically; Feature; Matching; Text; Website;
D O I
10.4018/IJWLTT.2020100104
中图分类号
学科分类号
摘要
The web is the largest world-wide communication system of computers. The web has local, academic, commercial and government sites. As the types of websites increases in numbers, the cost and accuracy of manual classification became cumbersome and cannot satisfy the increasing internet service demands, thereby automated classification became important for better and more accurate search engine results. Therefore, this research has proposed an algorithm for classifying different websites automatically by using randomly collected textual data from the webpages. This research also contributed ten dictionaries covering different domains and used as training data in the classification process. Finally, the classification was carried out using the proposed and Naïve Bayes algorithms and found the proposed algorithm outperformed on the scale of accuracy by 1.25%. This research suggests that the proposed algorithm could be applied to any number of domains if the related dictionaries are available. Copyright © 2020, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.
引用
收藏
页码:57 / 65
页数:8
相关论文
共 12 条
[1]  
Abdessamed O., Zakaria E., Web site classification based on URL and content: Algerian vs. non-Algerian case, Proceedings of the 2015 12th International Symposium on Programming and Systems (ISPS), pp. 1-8, (2015)
[2]  
Abidin T. F., Ferdhiana R., Algorithm for updating n-grams word dictionary for web classification, Proceedings of the International Conference on Informatics and Computing (ICIC), pp. 432-436, (2016)
[3]  
Akanbi O., Abunadi A., Zainal A., Phishing Website Classification: A Machine Learning Approach, Journal of Information Assurance & Security, 9, 5, (2014)
[4]  
Bruining E., Automatic Classification of Business Websites, (2015)
[5]  
Deng F., Web service matching based on semantic classification, (2012)
[6]  
Klassen M., Paturi N., Web document classification by keywords using random forests, Proceedings of the International Conference on Networked Digital Technologies, pp. 256-261, (2010)
[7]  
Meng R., Zhao Z., Chi Y., He D., Automatic Course Website Discovery from Search Engine Results, iConference 2017 Proceedings, 2, (2017)
[8]  
Mohammad R. M., Thabtah F., McCluskey L., Intelligent rule-based phishing websites classification, IET Information Security, 8, 3, pp. 153-160, (2014)
[9]  
Patil A. S., Pawar B. V., Automated classification of web sites using Naive Bayesian algorithm, Proceedings of the international multiconference of engineers and computer scientists, 1, pp. 519-523, (2012)
[10]  
Roul R. K., Sahay S. K., An Effective Approach for Web Document Classification using the Concept of Association Analysis of Data Mining, (2014)