Machine Learning for Imbalanced Datasets of Recognizing Inference in Text with Linguistic Phenomena

被引:0
作者
Day, Min-Yuh [1 ]
Tsai, Cheng-Chia [1 ]
机构
[1] Tamkang Univ, Dept Informat Management, New Taipei, Taiwan
来源
2015 IEEE 16TH INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION | 2015年
关键词
Imbalanced Datasets; Linguistic Phenomena; Machine Learning; Recognizing Inference in Text; Textual Entailment;
D O I
10.1109/IRI.2015.99
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recognizing inference in text (RITE) plays an important role in the answer validation modules for a Question Answering (QA) system. The problem of class imbalance has received increased attention in the machine learning community. In recent years, several attempts have been made on the linguistic phenomena analysis, however, little is known about the effects of imbalanced datasets with linguistic phenomenon in recognizing inference in text. The objective of this paper is to provide an empirical study on learning imbalanced datasets of recognizing inference in text with linguistic phenomena for a better understanding of the effects of imbalanced datasets with linguistic phenomenon in recognizing inference in text. In this paper, we proposed an analysis of imbalanced datasets of recognizing inference in text with linguistic phenomena using NTCIR 11 RITE-VAL gold standard dataset and development dataset. The experimental results suggest that the distribution of imbalanced datasets of recognizing inference in text with linguistic phenomenon could be dramatically varied on the performance of a machine learning classifier.
引用
收藏
页码:562 / 568
页数:7
相关论文
共 29 条
[1]   An approach for classification of highly imbalanced data using weighting and undersampling [J].
Anand, Ashish ;
Pugalenthi, Ganesan ;
Fogel, Gary B. ;
Suganthan, P. N. .
AMINO ACIDS, 2010, 39 (05) :1385-1391
[2]  
[Anonymous], 2006, P 21 NAT C ART INT
[3]  
[Anonymous], 2010, LREC
[4]  
Bar-Haim R., 2007, P 22 AAAI C ARTIFICI, P871
[5]  
Barandela R, 2003, LECT NOTES COMPUT SC, V2652, P80
[6]   ADJUSTED GEOMETRIC-MEAN: A NOVEL PERFORMANCE MEASURE FOR IMBALANCED BIOINFORMATICS DATASETS LEARNING [J].
Batuwita, Rukshan ;
Palade, Vasile .
JOURNAL OF BIOINFORMATICS AND COMPUTATIONAL BIOLOGY, 2012, 10 (04)
[7]  
Cabrio Elena., 2013, Linguistic Issues in Language Technology, V9
[8]  
Day MY, 2014, 2014 IEEE 15TH INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION (IRI), P607, DOI 10.1109/IRI.2014.7051945
[9]   Learning from Imbalanced Data [J].
He, Haibo ;
Garcia, Edwardo A. .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2009, 21 (09) :1263-1284
[10]   Imbalanced Class Learning in Epigenetics [J].
Haque, M. Muksitul ;
Skinner, Michael K. ;
Holder, Lawrence B. .
JOURNAL OF COMPUTATIONAL BIOLOGY, 2014, 21 (07) :492-507