Experimental Study of Time Series-based Dataset Selection for Effective Text Classification

被引:0
作者
Chae, Yeonghun [1 ]
Jeong, Do-Heon [2 ]
Kim, Taehong [2 ]
机构
[1] Univ Sci & Technol, Dept Bigdata Sci, Daejeon 34113, South Korea
[2] Korea Inst Sci & Technol Informat, Convergence Technol Res Div, Daejeon 34141, South Korea
来源
2017 9TH INTERNATIONAL CONFERENCE ON KNOWLEDGE AND SMART TECHNOLOGY (KST) | 2017年
关键词
Naive-Bayes; classification; dataset selection; time series analysis;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Conventional automatic document classification are currently faced with challenges in terms of learning time and computing power, owing to the ever-increasing amount of data on the web. In this paper, we propose an efficient classification method that uses time series-based dataset selection. In the proposed method, the dataset is split based on time series data and the best set of testing documents selected. The results of classification performance tests conducted using a Naive Bayes classifier indicate that using a small amount of data divided in terms of time series is more efficient than using the entire dataset for learning
引用
收藏
页码:354 / 358
页数:5
相关论文
共 10 条
[1]  
[Anonymous], 1997, ICML
[2]  
[Anonymous], 2003, P ACM S APPL COMP
[3]  
CHIH H, 2004, P 2004 IEEE WIC ACM, P599
[4]  
Geurts P., 2001, P 5 EUR C PRINC DAT, P115, DOI [DOI 10.1007/3-540-44794-6_10, 10.1007/3-540-44794-610, DOI 10.1007/3-540-44794-610]
[5]  
Gim J., 2014, P 3 SWCIB WORKSH, P25
[6]  
Jeong D., 2015, DEV INCREMENTAL LEAR
[7]   Ambiguity Measure Feature-Selection Algorithm [J].
Mengle, Saket S. R. ;
Goharian, Nazli .
JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2009, 60 (05) :1037-1050
[8]  
Soucy P, 2005, 19TH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE (IJCAI-05), P1130
[9]  
Wijaya D. T., 2011, DETECT 11, P35
[10]  
Xi X., 2006, P 23 INT C MACH LEAR, P1033, DOI 10.1145/1143844.1143974