基于众包的词汇联想网络的获取和分析

被引:6
作者
丁宇
车万翔
刘挺
张梅山
机构
[1] 哈尔滨工业大学计算机学院社会计算与信息检索研究中心
关键词
众包; 语义相关性词典; 词汇联想网络;
D O I
暂无
中图分类号
TP391.1 [文字信息处理];
学科分类号
081203 ; 0835 ;
摘要
词典是汉语自然语言处理中非常重要的一类资源,它能为汉语词法句法以及语义分析等提供资源支撑。该文采用众包方法构建汉语语义相关性词典,该词典是通过触发词联想的方式间接获取的,因此又称为词汇联想网络。词汇联想网络相比传统词典具有以下特点:(1)获取代价低;(2)面向互联网,易扩展;(3)词语关系从人的认知角度来建立,符合人的直觉。该文详细介绍词汇联想网络的获取方法并对已获取的数据进行分析,另外,将词汇联想网络与《知网》、《同义词词林》以及微博文本ngram进行比较说明其上述特点。
引用
收藏
页码:100 / 106
页数:7
相关论文
共 10 条
[1]  
Efficientbatch top-k search for dictionary-basedentity recogni-tion. Amit Chandel,P C Nagesh,S Sarawagi. Proceedings of the 22nd International Con-ference on Data Engineering . 2006
[2]  
Annotating large email datasets for named entity rec-ognition with mechanical turk. Nolan Lawson,Kevin Eustice,Mike Perkowitz,et al. Proceedings of theNAACL HLT 2010Workshop on Creating Speech andLanguage Data with Amazons Mechanical Turk . 2010
[3]  
Corpus creation for new genres:a crowdsourced ap-proach to PP attachment. Mukund Jha,Jacob Andreas,Kapil Thadani,et al. Proceedings of theNAACL HLT 2010Workshop on Creating Speech andLanguage Data with Amazons Mechanical Turk . 1999
[4]  
Using MechanicalTurk to Annotate Lexicons for Less Commonly UsedLanguages. Ann Irvine,Alexandre Klementiev. Proceedings of the NAACL HLT2010Workshop on Creating Speech and Language Datawith Amazons Mechanical Turk . 2010
[5]  
Brémaud,P. Markov Chains Gibbs Fields, Monte Carlo Simulation, and Queues . 1999
[6]  
Labeling Images with a Computer Game. Von Ahn,L,Dabbish,L. Proceedings of the ACM Conference on Human Factors in Computing Systems . 2004
[7]  
Structur-al patterns vs.string patterns for extracting semanticinformation from dictionaries. Simonetta Montemagni,Lucy Vanderwende. Proceedings of the14th conference on Computational linguistics . 1992
[8]  
同义词词林[M]. 上海辞书出版社 , 梅家驹等编, 1996
[9]  
Clustering bypassing messages between data points. Brendan J Frey,Delbert Dueck. Science . 2007
[10]  
Lexical Semantic Re-latedness with Random Graph Walk. Thad Hughes,Daniel Ramage. Joint Con-ference on Empirical Methods in Natural LanguageProcessing and Computational Natural LanguageLearning . 2007