Big data techniques: Large-scale text analysis for scientific and journalistic research

被引:40
作者
Arcila-Calderon, Carlos [1 ]
Barbosa-Caro, Eduar [2 ]
Cabezuelo-Lorenzo, Francisco [3 ]
机构
[1] Univ Salamanca, Fac Ciencias Sociales, Campus Miguel Unamuno,Edificio FES, Salamanca 37071, Spain
[2] Univ Norte, Via Puerto Colombia,Km 5, Barranquilla, Colombia
[3] Univ Valladolid, Fac Ciencias Sociales Jurid & Comunicac, Plaza Univ 1, Segovia 40005, Colombia
来源
PROFESIONAL DE LA INFORMACION | 2016年 / 25卷 / 04期
关键词
Data; Big data; Data mining; Machine learning; Topic modeling; Sentiment analysis;
D O I
10.3145/epi.2016.jul.12
中图分类号
G2 [信息与知识传播];
学科分类号
05 ; 0503 ;
摘要
This paper conceptualizes the term big data and describes its relevance in social research and journalistic practices. We explain large-scale text analysis techniques such as automated content analysis, data mining, machine learning, topic modeling, and sentiment analysis, which may help scientific discovery in social sciences and news production in journalism. We explain the required e-infrastructure for big data analysis with the use of cloud computing and we asses the use of the main packages and libraries for information retrieval and analysis in commercial software and programming languages such as Python or R.
引用
收藏
页码:623 / 631
页数:9
相关论文
共 34 条
  • [1] Alpaydin E., 2010, INTRO MACHINE LEARNI
  • [2] [Anonymous], 2001, ADAP COMP MACH LEARN
  • [3] [Anonymous], 2002, MALLET: A machine learning for language toolkit
  • [4] [Anonymous], PROCS AM SOC INFORM
  • [5] Arora S., 2013, INT C MACH LEARN, P280
  • [6] Blei D. M., 2012, Journal of Digital Humanities, V2, P8
  • [7] Machine learning: my favorite results, directions, and open problems
    Blum, A
    [J]. 44TH ANNUAL IEEE SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE, PROCEEDINGS, 2003, : 2 - 2
  • [8] Knowledge-Based Approaches to Concept-Level Sentiment Analysis INTRODUCTION
    Cambria, Erik
    Schuller, Bjoern
    Liu, Bing
    Wang, Haixun
    Havasi, Catherine
    [J]. IEEE INTELLIGENT SYSTEMS, 2013, 28 (02) : 12 - 14
  • [9] Data Science and Prediction
    Dhar, Vasant
    [J]. COMMUNICATIONS OF THE ACM, 2013, 56 (12) : 64 - 73
  • [10] Dietterich T.G., 2003, NATURE ENCY COGNITIV