The Growing N-Gram Algorithm: A Novel Approach to String Clustering

被引:0
|
作者
Grappiolo, Corrado [1 ]
Verwielen, Eline
Noorman, Nils [2 ]
机构
[1] ESI TNO, Eindhoven, Netherlands
[2] Philips Healthcare, Best, Netherlands
来源
ICPRAM: PROCEEDINGS OF THE 8TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION APPLICATIONS AND METHODS | 2019年
关键词
String Clustering; N-Grams; Operational Usage Modelling; System Verification Testing;
D O I
10.5220/0007259200520063
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Connected high-tech systems allow the gathering of operational data at unprecedented volumes. A direct benefit of this is the possibility to extract usage models, that is, a generic representations of how such systems are used in their field of application. Usage models are extremely important, as they can help in understanding the discrepancies between how a system was designed to be used and how it is used in practice. We interpret usage modelling as an unsupervised learning task and present a novel algorithm, hereafter called Growing N-Grams (GNG), which relies on n-grams- arguably the most popular modelling technique for natural language processing - to cluster and model, in a two-step rationale, a dataset of strings. We empirically compare its performance against some other common techniques for string processing and clustering. The gathered results suggest that the GNG algorithm is a viable approach to usage modelling.
引用
收藏
页码:52 / 63
页数:12
相关论文
共 50 条
  • [1] Compact n-gram models by incremental growing and clustering of histories
    Virpioja, Sami
    Kurimo, Mikko
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 1037 - 1040
  • [2] FAST STRING-MATCHING USING AN N-GRAM ALGORITHM
    KIM, JY
    SHAWETAYLOR, J
    SOFTWARE-PRACTICE & EXPERIENCE, 1994, 24 (01): : 79 - 88
  • [3] GPU Based N-Gram String Matching Algorithm with Score Table Approach for String Searching in Many Documents
    Srinivasa K.G.
    Shree Devi B.N.
    Journal of The Institution of Engineers (India): Series B, 2017, 98 (5) : 467 - 476
  • [4] Aspect clustering combined n-gram for reviews
    Zhang, Shibo
    Wang, Xiaojie
    Open Cybernetics and Systemics Journal, 2014, 8 (01): : 938 - 943
  • [5] N-gram approach for gender prediction
    Reddy, T. Raghunadha
    Vardhan, B. Vishnu
    Reddy, P. Vijayapal
    2017 7TH IEEE INTERNATIONAL ADVANCE COMPUTING CONFERENCE (IACC), 2017, : 860 - 865
  • [6] N-gram approach for a URL Similarity Measure
    Singh, Neetu
    Chaudhari, Narendra S.
    2016 1ST INDIA INTERNATIONAL CONFERENCE ON INFORMATION PROCESSING (IICIP), 2016,
  • [7] n-Gram/2L-approximation: a two-level n-gram inverted index structure for approximate string matching
    Kim, Min-Soo
    Whang, Kyu-Young
    Lee, Jae-Gil
    COMPUTER SYSTEMS SCIENCE AND ENGINEERING, 2007, 22 (06): : 365 - 379
  • [8] Bangla Word Clustering Based on N-gram Language Model
    Ismail, Sabir
    Rahman, M. Shahidur
    2014 1ST INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING AND INFORMATION & COMMUNICATION TECHNOLOGY (ICEEICT 2014), 2014,
  • [9] Tuning N-gram String Kernel SVMs via Meta Learning
    Gunasekara, Nuwan
    Pang, Shaoning
    Kasabov, Nikola
    NEURAL INFORMATION PROCESSING: MODELS AND APPLICATIONS, PT II, 2010, 6444 : 91 - 98
  • [10] Language modeling by string pattern N-gram for Japanese speech recognition
    Ito, A
    Kohda, M
    ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 490 - 493