Learning Phrase Patterns for Text Classification Using a Knowledge Graph and Unlabeled Data

被引:0
作者
Marin, Alex [1 ]
Holenstein, Roman [2 ]
Sarikaya, Ruhi [2 ]
Ostendorf, Mari [1 ]
机构
[1] Univ Washington, Seattle, WA 98195 USA
[2] Microsoft Corp, Redmond, WA 98052 USA
来源
15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4 | 2014年
关键词
SEQUENTIAL PATTERNS;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper explores a novel method for learning phrase pattern features for text classification, employing a mapping of selected words into a knowledge graph and self-training over unlabeled data. Using Support Vector Machine classification, we obtain improvements over lexical and fully-supervised phrase pattern features in domain and intent detection for language understanding, particularly in conjunction with the use of unlabeled data. Our best results are obtained using unlabeled data filtered for both model training and feature learning based on the confidence of the baseline classifiers.
引用
收藏
页码:253 / 257
页数:5
相关论文
共 20 条
[1]  
[Anonymous], NUMERICAL TAXONOMY
[2]  
[Anonymous], 2001, LINGUISTIC INQUIRY W
[3]  
Bollacker K., 2008, P 2008 ACM SIGMOD IN, P1247, DOI DOI 10.1145/1376616.1376746
[4]  
Brown P. F., 1992, Computational Linguistics, V18, P467
[5]   SUPPORT-VECTOR NETWORKS [J].
CORTES, C ;
VAPNIK, V .
MACHINE LEARNING, 1995, 20 (03) :273-297
[6]   Sequential patterns for text categorization [J].
Jaillet, S. ;
Laurent, A. ;
Teisseire, M. .
INTELLIGENT DATA ANALYSIS, 2006, 10 (03) :199-214
[7]  
Ji XN, 2005, FIFTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, P194
[8]  
Marin A., 2010, Proceedings 2010 IEEE Spoken Language Technology Workshop (SLT 2010), P49, DOI 10.1109/SLT.2010.5700821
[9]  
Marin A., 2011, Proc. Workshop on Language in Social Media, P48
[10]  
McClosky David, 2006, P HLT NAACL