Multilingual single document keyword extraction for information retrieval

被引:0
作者
Bracewell, DB [1 ]
Ren, FJ [1 ]
Kuriowa, S [1 ]
机构
[1] Univ Tokushima, Fac Engn, Dept Informat Sci & Intelligent Syst, Tokushima 7700861, Japan
来源
Proceedings of the 2005 IEEE International Conference on Natural Language Processing and Knowledge Engineering (IEEE NLP-KE'05) | 2005年
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Keywords play an important role in many aspects of information retrieval (IR). From web searches to text summarization good keywords are a necessity. In a typical IR system algorithms are used which require the entire document collection to be built beforehand. While some research has been done on extracting keywords from a single document, the quality of the keywords was not based on how well they perform in IR tasks. Moreover, they are designed for only one language and the applicability to other languages is unknown. As such, this paper proposes a new algorithm that is applicable to multiple languages and extracts effective keywords that, to a high degree, uniquely identify a document. It needs only a single document to extract keywords and does not rely on machine learning methods. It was tested on a Japanese-English bilingual corpus and a portion of the Reuter's corpus using a keyword search algorithm. The results show that the extracted keywords do a good job at uniquely identifying the documents.
引用
收藏
页码:517 / 522
页数:6
相关论文
共 16 条
[1]  
[Anonymous], P 3 C APPL NAT LANG
[2]  
BRACEWELL DB, 2005, IN PRESS P 3 INT S P
[3]  
BRANTS T, 2000, P 6 APPL NLP C ANLP
[4]  
FRY J, PARALLEL JAPANESE EN
[5]  
HULTH A, 2004, P HUM LANG TECHN C N
[6]  
Hulth Anette, 2003, P C EMP METH NAT LAN
[7]  
KUDO T, YET PART SPEECH MORP
[8]  
Lewis DD, 2004, J MACH LEARN RES, V5, P361
[9]  
Matsuo Y., 2004, International Journal on Artificial Intelligence Tools (Architectures, Languages, Algorithms), V13, P157, DOI 10.1142/S0218213004001466
[10]  
*NFORMATIX, SOFTW FULL TEXT INF