Boosting exact pattern matching with extreme gradient boosting (and more)

被引:0
作者
Susik, Robert [1 ]
Grabowski, Szymon [1 ]
机构
[1] Lodz Univ Technol, Inst Appl Comp Sci, Lodz, Poland
关键词
Text matching; Algorithm selection; Machine learning; Gradient boosting; Pattern matching;
D O I
10.1007/s11227-025-07165-2
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Pattern matching is a well-known problem in computer science. Over the years, dozens of exact pattern matching algorithms have been developed. Clearly, search speed is usually the most important aspect, but it is difficult to tell which algorithm is fastest for a specific (given) pattern. Most applications, programming languages, and domain-specific tools maintain a single algorithm for exact pattern matching that may not be the best choice for all use cases. The key finding of this study is that the pattern itself contains information about which algorithm should be used to search for it. We take advantage of this fact to develop a solution that enables faster pattern searching by leveraging machine learning models to select the best-performing algorithm for a given pattern. The selection method uses machine learning models such as Random Forest, Extra Trees, AdaBoost, Bootstrap Aggregation, and Gradient Boosting. The proposed solution is online, i.e., does not require prior reading of the text and is based on the information extracted from the pattern. Experiments show that it is 11% faster than the fastest (on average) exact pattern matching algorithm.
引用
收藏
页数:15
相关论文
共 40 条
  • [1] Allauzen C, 1999, LECT NOTES COMPUT SC, V1725, P295
  • [2] Baeza-Yates R, 1989, Efficient text searching
  • [3] A NEW APPROACH TO TEXT SEARCHING
    BAEZAYATES, R
    GONNET, GH
    [J]. COMMUNICATIONS OF THE ACM, 1992, 35 (10) : 74 - 82
  • [4] Berry T, 1999, PRAG STRING CLUB WOR
  • [5] FAST STRING SEARCHING ALGORITHM
    BOYER, RS
    MOORE, JS
    [J]. COMMUNICATIONS OF THE ACM, 1977, 20 (10) : 762 - 772
  • [6] Random forests
    Breiman, L
    [J]. MACHINE LEARNING, 2001, 45 (01) : 5 - 32
  • [7] Random forests
    Breiman, L
    [J]. MACHINE LEARNING, 2001, 45 (01) : 5 - 32
  • [8] Cantone Domenico, 2005, Journal of Automata, Languages and Combinatorics, V10, P589
  • [9] Cantone D, 2004, P 3 INT C FUN ALG TU, P118
  • [10] Charras C, 1998, LECT NOTES COMPUT SC, V1448, P55, DOI 10.1007/BFb0030780