RTextTools: A Supervised Learning Package for Text Classification

被引:1
作者
Jurka, Timothy P. [1 ]
Collingwood, Loren [2 ]
Boydstun, Amber E. [1 ]
Grossman, Emiliano [3 ]
van Atteveldt, Wouter [4 ]
机构
[1] Univ Calif Davis, Dept Polit Sci, Davis, CA 95616 USA
[2] Univ Calif Riverside, Dept Polit Sci, Riverside, CA 92521 USA
[3] Sci Po CEE, F-75007 Paris, France
[4] Vrije Univ Amsterdam, Dept Commun Sci, NL-1081 HV Amsterdam, Netherlands
来源
R JOURNAL | 2013年 / 5卷 / 01期
关键词
ACCURACY;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Social scientists have long hand-labeled texts to create datasets useful for studying topics from congressional policymaking to media reporting. Many social scientists have begun to incorporate machine learning into their toolkits. RTextTools was designed to make machine learning accessible by providing a start-to-finish product in less than 10 steps. After installing RTextTools, the initial step is to generate a document term matrix. Second, a container object is created, which holds all the objects needed for further analysis. Third, users can use up to nine algorithms to train their data. Fourth, the data are classified. Fifth, the classification is summarized. Sixth, functions are available for performance evaluation. Seventh, ensemble agreement is conducted. Eighth, users can cross-validate their data. Finally, users write their data to a spreadsheet, allowing for further manual coding if required.
引用
收藏
页码:6 / 12
页数:7
相关论文
共 15 条
[1]  
Adler E.Scott., 2004, Congressional Bills Project
[2]  
Baumgartner FR, 2008, DECLINE OF THE DEATH PENALTY AND THE DISCOVERY OF INNOCENCE, P1
[3]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[4]   Tradeoffs in Accuracy and Efficiency in Supervised Learning Methods [J].
Collingwood, Loren ;
Wilkerson, John .
JOURNAL OF INFORMATION TECHNOLOGY & POLITICS, 2012, 9 (03) :298-318
[5]  
Feinerer I, 2008, J STAT SOFTW, V25, P1
[6]   Regularization Paths for Generalized Linear Models via Coordinate Descent [J].
Friedman, Jerome ;
Hastie, Trevor ;
Tibshirani, Rob .
JOURNAL OF STATISTICAL SOFTWARE, 2010, 33 (01) :1-22
[7]   Representation and American Governing Institutions [J].
Jones, Bryan D. ;
Larsen-Price, Heather ;
Wilkerson, John .
JOURNAL OF POLITICS, 2009, 71 (01) :277-290
[8]  
Jurka TP, 2012, R J, V4, P56
[9]  
McLaughlin M. R., 2004, Proceedings of Sheffield SIGIR 2004. The Twenty-Seventh Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, P329, DOI 10.1145/1008992.1009050
[10]  
Meyer D., 2012, e1071: Misc Functions of the Department of Statistics (e1071), TU Wien