Using the Wayback Machine to Mine Websites in the Social Sciences: A Methodological Resource

被引:56
作者
Arora, Sanjay K. [1 ]
Li, Yin [1 ]
Youtie, Jan [2 ]
Shapira, Philip [3 ]
机构
[1] Georgia Inst Technol, Sch Publ Policy, Atlanta, GA 30332 USA
[2] Georgia Inst Technol, Enterprise Innovat Inst, Atlanta, GA 30308 USA
[3] Univ Manchester, Manchester Inst Innovat Res, Manchester Business Sch, Manchester M13 9PL, Lancs, England
基金
英国经济与社会研究理事会; 美国国家科学基金会;
关键词
INNOVATION; WEB; NETWORKS; ENTREPRENEURSHIP; STRATEGIES;
D O I
10.1002/asi.23503
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Websites offer an unobtrusive data source for developing and analyzing information about various types of social science phenomena. In this paper, we provide a methodological resource for social scientists looking to expand their toolkit using unstructured web-based text, and in particular, with the Wayback Machine, to access historical website data. After providing a literature review of existing research that uses the Wayback Machine, we put forward a step-by-step description of how the analyst can design a research project using archived websites. We draw on the example of a project that analyzes indicators of innovation activities and strategies in 300 U.S. small- and medium-sized enterprises in green goods industries. We present six steps to access historical Wayback website data: (a) sampling, (b) organizing and defining the boundaries of the web crawl, (c) crawling, (d) website variable operationalization, (e) integration with other data sources, and (f) analysis. Although our examples draw on specific types of firms in green goods industries, the method can be generalized to other areas of research. In discussing the limitations and benefits of using the Wayback Machine, we note that both machine and human effort are essential to developing a high-quality data set from archived web information.
引用
收藏
页码:1904 / 1915
页数:12
相关论文
共 39 条
[1]   Collaboration networks, structural holes, and innovation: A longitudinal study [J].
Ahuja, G .
ADMINISTRATIVE SCIENCE QUARTERLY, 2000, 45 (03) :425-455
[2]  
AMATUCCI FM, 2007, ENTRERENEURSHIP ENGI, V2, P87
[3]  
[Anonymous], 2005, P 43 ANN M ASS COMP, DOI DOI 10.3115/1219840.1219885
[4]  
[Anonymous], 2011, Text Processing with GATE (Version 6)
[5]  
[Anonymous], 2006, P 1 ACM SIGOPSEUROSY
[6]  
Arora S. K., 2014, 23 INT MAN TECHN ANN
[7]   Entry strategies in an emerging technology: a pilot web-based study of graphene firms [J].
Arora, Sanjay K. ;
Youtie, Jan ;
Shapira, Philip ;
Gao, Lidan ;
Ma, TingTing .
SCIENTOMETRICS, 2013, 95 (03) :1189-1207
[8]  
Auerswald P. E., 2008, FINANCING ENTRPRENEU
[9]  
Auerswald P. E., 2008, FINANCING ENTRPRENEU, V12
[10]   Response rate in academic studies - A comparative analysis [J].
Baruch, Y .
HUMAN RELATIONS, 1999, 52 (04) :421-438