A Method for Filtering Pages by Similarity Degree based on Dynamic Programming

被引:0
作者
Deng, Ziyun [1 ,2 ]
He, Tingqin [2 ]
机构
[1] Changsha Commerce & Tourism Coll, Coll Econ & Trade, Changsha 410116, Hunan, Peoples R China
[2] Hunan Univ, Natl Supercomp Ctr Changsha, Changsha 410116, Hunan, Peoples R China
关键词
method for filtering pages; similarity degree; dynamic programming; combination method;
D O I
10.3390/fi10120124
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
To obtain the target webpages from many webpages, we proposed a Method for Filtering Pages by Similarity Degree based on Dynamic Programming (MFPSDDP). The method needs to use one of three same relationships proposed between two nodes, so we give the definition of the three same relationships. The biggest innovation of MFPSDDP is that it does not need to know the structures of webpages in advance. First, we address the design ideas with queue and double threads. Then, a dynamic programming algorithm for calculating the length of the longest common subsequence and a formula for calculating similarity are proposed. Further, for obtaining detailed information webpages from 200,000 webpages downloaded from the famous website "www.jd.com", we choose the same relationship Completely Same Relationship (CSR) and set the similarity threshold to 0.2. The Recall Ratio (RR) of MFPSDDP is in the middle in the four filtering methods compared. When the number of webpages filtered is nearly 200,000, the PR of MFPSDDP is highest in the four filtering methods compared, which can reach 85.1%. The PR of MFPSDDP is 13.3 percentage points higher than the PR of a Method for Filtering Pages by Containing Strings (MFPCS).
引用
收藏
页数:12
相关论文
共 24 条
[1]  
AbdulHussien AA, 2017, INT J ADV COMPUT SC, V8, P205
[2]   Intelligent classification of web pages using contextual and visual features [J].
Ahmadi, Ali ;
Fotouhi, Mehran ;
Khaleghi, Mahmoud .
APPLIED SOFT COMPUTING, 2011, 11 (02) :1638-1647
[3]   A Fuzzy Ontology and SVM-Based Web Content Classification System [J].
Ali, Farman ;
Khan, Pervez ;
Riaz, Kashif ;
Kwak, Daehan ;
Abuhmed, Tamer ;
Park, Daeyoung ;
Kwak, Kyung Sup .
IEEE ACCESS, 2017, 5 :25781-25797
[4]   A tree-based algorithm for attribute selection [J].
Baranauskas, Jose Augusto ;
Netto, Oscar Picchi ;
Nozawa, Sergio Ricardo ;
Macedo, Alessandra Alaniz .
APPLIED INTELLIGENCE, 2018, 48 (04) :821-833
[5]   An efficient scheme for automatic web pages categorization using the support vector machine [J].
Bhalla, Vinod Kumar ;
Kumar, Neeraj .
NEW REVIEW OF HYPERMEDIA AND MULTIMEDIA, 2016, 22 (03) :223-242
[6]  
Chinniyan K, 2017, INT ARAB J INF TECHN, V14, P285
[7]   Self-similarity in World Wide Web traffic: Evidence and possible causes [J].
Crovella, ME ;
Bestavros, A .
IEEE-ACM TRANSACTIONS ON NETWORKING, 1997, 5 (06) :835-846
[8]   Automatic Combination Technology of Fuzzy CPN for OWL-S Web Services in Supercomputing Cloud Platform [J].
Deng, Ziyun ;
Zhang, Jing ;
He, Tingqin .
INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2017, 31 (07)
[9]   An Automatic Document Classifier System Based on Genetic Algorithm and Taxonomy [J].
Diaz-Manriquez, Alan ;
Bertha Rios-Alvarado, Ana ;
Hugo Barron-Zambrano, Jose ;
Yukary Guerrero-Melendez, Tania ;
Carlos Elizondo-Leal, Juan .
IEEE ACCESS, 2018, 6 :21552-21559
[10]   A topic-specific crawling strategy based on semantics similarity [J].
Du, YaJun ;
Pen, QiangQiang ;
Gao, ZhaoQiong .
DATA & KNOWLEDGE ENGINEERING, 2013, 88 :75-93