Assisting cluster coherency via n-grams and clustering as a tool to deal with the new user problem

被引:5
作者
Bouras, Christos [1 ,2 ]
Tsogkas, Vassilis [1 ]
机构
[1] Univ Patras, Comp Engn & Informat Dept, Patras, Greece
[2] Comp Technol Inst & Press Diophantus, Patras 26500, Greece
关键词
New user problem; Collaborative filtering; Clustering; W-kmeans; K-means; Personalized strategy; n-grams; Text preprocessing;
D O I
10.1007/s13042-014-0264-y
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Collaborative filtering systems typically need to acquire some data about the new user in order to start making personalized suggestions, a situation commonly referred to as the "new user problem". In this work we attempt to address the new user problem via a unique personalized strategy for prompting the user with articles to rate. Our approach makes use of hypernyms extracted from the WordNet database and proves to be converging fast to the actual user interests based on minimal user ratings, which are provided during the registration process. In addition, we explore the possible enhancement of the document clustering results, and in particular clustering of news articles from the web, when using word-based n-grams during the keyword extraction phase. We present and evaluate a weighting approach that combines clustering of news articles derived from the web, using n-grams that are extracted from the articles at an offline stage. This technique is then compared with the single minded "bag-of-words" representation that our clustering algorithm, W-kmeans, previously used. Our experimentation reveals that via fine tuning the weighting parameters between keyword and n-grams, as well as the n value itself, a significant improvement regarding the clustering results metrics can be achieved.
引用
收藏
页码:171 / 184
页数:14
相关论文
共 31 条
  • [1] Abou-Assaleh Tony., 2004, Detection of new malicious code using n-grams signatures, P193
  • [2] Fab: Content-based, collaborative recommendation
    Balabanovic, M
    Shoham, Y
    [J]. COMMUNICATIONS OF THE ACM, 1997, 40 (03) : 66 - 72
  • [3] Barrón-Cedeño A, 2009, LECT NOTES COMPUT SC, V5478, P696, DOI 10.1007/978-3-642-00958-7_69
  • [4] Bouras C., 2011, Proceedings of the Seventh International Conference on Signal-Image Technology & Internet-Based Systems (SITIS 2011), P75, DOI 10.1109/SITIS.2011.19
  • [5] PeRSSonal's core functionality evaluation: Enhancing text labeling through personalized summaries
    Bouras, Christos
    Poulopoulos, Vassilis
    Tsogkas, Vassilis
    [J]. DATA & KNOWLEDGE ENGINEERING, 2008, 64 (01) : 330 - 345
  • [6] Bouras C, 2010, LECT NOTES ARTIF INT, V6278, P379, DOI 10.1007/978-3-642-15393-8_43
  • [7] Cavnar W., 1994, P SDAIR 94
  • [8] Crane M., 2011, THESIS U OTAGO
  • [9] Damerau F, 1994, P SIGIR 94
  • [10] Ekstrand M.D, 2011, FDN TRENDS HUM COMPU, P4