A fast search method of similar strings from dictionaries

被引:1
作者
Fuketa, Masao [1 ]
Atlam, El-Sayed [1 ]
Fujisawa, Nobuo [1 ]
Hanafusa, Hiroshi [1 ]
Morita, Kazuhiro [1 ]
Aoe, Jun-ichi [1 ]
机构
[1] Univ Tokushima, Dept Informat Sci & Intelligent Syst, Tokushima 7708506, Japan
关键词
edit distance; similar strings; N-gram; substrings;
D O I
10.1504/IJCAT.2011.041655
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The World Wide Web is growing ever more rapidly, and there are benefits from rich information. Moreover, demands for retrieving similar strings to an input string from dictionaries have been increasing. The edit distance is necessary to retrieve information from a large amount of data using the similarity between two strings. However, drawback of this method is time consumption because the input string must be compared with all strings in dictionaries. This study proposes a new technique for retrieving similar strings from dictionaries at high speed. The method presented can retrieve all similar strings 14 times faster than unigram methods although the edit distance is 3.
引用
收藏
页码:265 / 272
页数:8
相关论文
共 11 条
[1]   A Retrieval Method of Similar Strings Using Substrings [J].
Fuketa, Masao ;
Fujisawa, Nobuo ;
Bando, Hiroaki ;
Morita, Kazuhiro ;
Aoe, Jun-ichi .
2010 SECOND INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING AND APPLICATIONS: ICCEA 2010, PROCEEDINGS, VOL 2, 2010, :300-304
[2]   Structural optimization of a full-text n-gram index using relational normalization [J].
Kim, Min-Soo ;
Whang, Kyu-Young ;
Lee, Jae-Gil ;
Lee, Min-Jae .
VLDB JOURNAL, 2008, 17 (06) :1485-1507
[3]  
Knuth D., 1973, SORTING SEARCHING, V3, P723
[4]  
Levenshtein V., 1966, SOVIET PHYS DOKLADY
[5]  
Marukawa K., 1996, Transactions of the Institute of Electronics, Information and Communication Engineers D-II, VJ79D-II, P785
[6]  
Mayfield J., 2003, P 26 ANN INT ACM SIG, P415
[7]   An O(ND) Difference Algorithm and Its Variations [J].
Myers, Eugene W. .
ALGORITHMICA, 1986, 1 (1-4) :251-266
[8]  
Numakura S., 1989, Transactions of the Information Processing Society of Japan, V30, P1468
[9]  
Oflazer K, 1996, COMPUT LINGUIST, V22, P73
[10]  
Ogawa Y., 1995, P 18 ACM SIGIR NEW Y, P112