Special Considerations for the Acquisition and Wrangling of Big Data

被引:28
作者
Braun, Michael T. [1 ]
Kuljanin, Goran [2 ]
DeShon, Richard P. [3 ]
机构
[1] Univ S Florida, Dept Psychol, 4202 E Fowler Rd,PCD 4118G, Tampa, FL 33620 USA
[2] DePaul Univ, Dept Psychol, Chicago, IL USA
[3] Michigan State Univ, Dept Psychol, E Lansing, MI 48824 USA
关键词
big data; data acquisition; data wrangling; data cleaning; MANAGEMENT; EMERGENCE; DYNAMICS; REVOLUTION; COGNITION; SCIENCE;
D O I
10.1177/1094428117690235
中图分类号
B849 [应用心理学];
学科分类号
040203 ;
摘要
Organizational scientists must capitalize on the big data revolution to better understand the nomothetic, idiographic, multilevel, and/or dynamic processes that make up today's workplace. Simultaneously, researchers must collect high-quality data and be careful, diligent, and deliberate during data wrangling and data analysis so that all results can be replicated and all inferences are appropriate. Unfortunately, big data create many uncommon challenges during data acquisition and data wrangling that must be considered and overcome to fulfill the promise and potential of big data. Specifically, during acquisition, organizational scientists must become familiar with concepts like web scraping and databases, determine how to divide big data files into manageable chunks for cleaning and analysis, all while ensuring not to violate data usage rules and regulations. Likewise, once acquired, to effectively wrangle data so that they are ready for analysis researchers must be able to handle multiple file formats and data encoding standards, utilize a variety of software to visualize and diagnose data structure, and be adept at using functions and algorithms to determine variable structure and evaluate records and variables for missing or erroneous information. The current article provides a concise definition of big data and addresses each of these novel challenges and concepts related to big data acquisition and wrangling, specifically focusing on providing guidance and recommendations. Finally, a detailed big data example, team development using play-by-play basketball data, is provided. Each step of the process of scraping the data from the web as well as wrangling the multilevel big data into tidy data form is discussed, accompanied by a supplemental R file that contains all of the code necessary for researchers to replicate the described procedure.
引用
收藏
页码:633 / 659
页数:27
相关论文
共 40 条
[1]  
[Anonymous], 2003, FIN NIH STAT SHAR RE
[2]  
[Anonymous], 2009, P 2009 AAAI SPRING S
[3]  
Arbuckle J.L., 1996, ADV STRUCTURAL EQUAT, P243, DOI DOI 10.4324/9781315827414
[4]  
Braun M.T., 2015, IND ORG PSYCHOL, V8, P521
[5]   Spurious Results in the Analysis of Longitudinal Data in Organizational Research [J].
Braun, Michael T. ;
Kuljanin, Goran ;
DeShon, Richard P. .
ORGANIZATIONAL RESEARCH METHODS, 2013, 16 (02) :302-330
[6]  
Codd E. F., 1990, The relational model for database management: version 2
[7]   R is for Revolution: A Cutting-Edge, Free, Open Source Statistical Package [J].
Culpepper, Steven Andrew ;
Aguinis, Herman .
ORGANIZATIONAL RESEARCH METHODS, 2011, 14 (04) :735-740
[8]  
Dasu T., 2003, Exploratory Data Mining and Data Cleaning, Vfirst
[9]   A Primer on Maximum Likelihood Algorithms Available for Use With Missing Data [J].
Enders, Craig K. .
STRUCTURAL EQUATION MODELING-A MULTIDISCIPLINARY JOURNAL, 2001, 8 (01) :128-141
[10]   Explaining employees' health care costs: A prospective examination of stressful job demands, personal control, and physiological reactivity [J].
Ganster, DC ;
Fox, ML ;
Dwyer, DJ .
JOURNAL OF APPLIED PSYCHOLOGY, 2001, 86 (05) :954-964