POS Weighted TF-IDF Algorithm and its Application for an MOOC Search Engine

被引:0
作者
Xu, Ruilin [1 ]
机构
[1] UIUC, Dept Comp Sci, Urbana, IL 61801 USA
来源
2014 INTERNATIONAL CONFERENCE ON AUDIO, LANGUAGE AND IMAGE PROCESSING (ICALIP), VOLS 1-2 | 2014年
关键词
Information Retrieval; TF-IDF; POS Weighted; MOOC search engine;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Term Frequency-Inverse Document Frequency (TF-IDF) has been one of the most highly used information retrieval methods for many years. Although there are several variants of TF-IDF optimizing for solving various problems, very few of them considered the properties of the query terms themselves. We found that there could be a big potential for improvement. When people type out a query, usually the verbs and the nouns are the primary keywords that directly define the query. The adjectives and adverbs are generally the secondary keywords, which describe the query more accurately. Other terms might not be as important as the terms just mentioned and could be the tertiary keywords. Based on this fact, this paper proposes an algorithm improved upon the original TF-IDF algorithm - POS Weighted TF-IDF algorithm. This algorithm takes every query term's part of speech (POS) into account and assigns each query term frequency a different weight value according to the POS of that term. Based on the POS Weighted TF-IDF Algorithm, we developed COURSES, a massive open online courses (MOOC) search engine, and achieved very positive results, which shows the effectiveness of the proposed algorithm.
引用
收藏
页码:868 / 873
页数:6
相关论文
共 12 条
[1]  
Amati G, 2002, LECT NOTES COMPUT SC, V2291, P183
[2]  
[Anonymous], P EUR C MACH LEARN
[3]  
[Anonymous], J DOCUMENTATION
[4]  
[Anonymous], P 1 INSTR C MACH LEA
[5]  
[Anonymous], 2001, IEEE Data Eng. Bull.
[6]  
Lv Yuanhua, 2011, P 20 ACM INT C INF K, P1985
[7]  
Robertson S., 2004, P 13 ACM INT C INF K, P42
[8]  
Robertson S. E., 1994, SIGIR '94. Proceedings of the Seventeenth Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval, P232
[9]  
Singhal A., 1996, SIGIR Forum, P21
[10]  
Soucy P., 2005, Processing of International Joint Conference of Artificial Intelligence, P1130