Use of web mining in studying innovation

被引:69
作者
Goek, Abdullah [1 ]
Waterworth, Alec [1 ]
Shapira, Philip [1 ,2 ]
机构
[1] Univ Manchester, Manchester Business Sch, Manchester Inst Innovat Res, Manchester M13 9PL, Lancs, England
[2] Georgia Inst Technol, Atlanta, GA 30332 USA
基金
英国经济与社会研究理事会;
关键词
Web mining; Web scraping; Innovation; R&D; TRIPLE-HELIX; INDUSTRY; NANOTECHNOLOGY; STRATEGIES;
D O I
10.1007/s11192-014-1434-0
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
As enterprises expand and post increasing information about their business activities on their websites, website data promises to be a valuable source for investigating innovation. This article examines the practicalities and effectiveness of web mining as a research method for innovation studies. We use web mining to explore the R&D activities of 296 UK-based green goods small and mid-size enterprises. We find that website data offers additional insights when compared with other traditional unobtrusive research methods, such as patent and publication analysis. We examine the strengths and limitations of enterprise innovation web mining in terms of a wide range of data quality dimensions, including accuracy, completeness, currency, quantity, flexibility and accessibility. We observe that far more companies in our sample report undertaking R&D activities on their web sites than would be suggested by looking only at conventional data sources. While traditional methods offer information about the early phases of R&D and invention through publications and patents, web mining offers insights that are more downstream in the innovation process. Handling website data is not as easy as alternative data sources, and care needs to be taken in executing search strategies. Website information is also self-reported and companies may vary in their motivations for posting (or not posting) information about their activities on websites. Nonetheless, we find that web mining is a significant and useful complement to current methods, as well as offering novel insights not easily obtained from other unobtrusive sources.
引用
收藏
页码:653 / 671
页数:19
相关论文
共 33 条
  • [1] Engaging With the Public? Assessing the Online Presence and Communication Practices of the Nanotechnology Industry
    Ackland, Robert
    Gibson, Rachel
    Lusoli, Wainer
    Ward, Stephen
    [J]. SOCIAL SCIENCE COMPUTER REVIEW, 2010, 28 (04) : 443 - 465
  • [2] A research case study: Difficulties and recommendations when using a textual data mining tool
    Al-Hassan, Abeer A.
    Alshameri, Faleh
    Sibley, Edgar H.
    [J]. INFORMATION & MANAGEMENT, 2013, 50 (07) : 540 - 552
  • [3] AleEbrahim Neda, 2013, International Journal of Business Information Systems, V13, P343
  • [4] [Anonymous], 2 INT SEV SEM FUT OR
  • [5] [Anonymous], 2002, The Measurement of Scientific and Technological Activities, DOI DOI 10.1787/9789955682684-LT
  • [6] [Anonymous], ESJ00830312012 EC SO
  • [7] [Anonymous], DET INF UK IR CO
  • [8] [Anonymous], IND CORPORATE CHANGE
  • [9] Entry strategies in an emerging technology: a pilot web-based study of graphene firms
    Arora, Sanjay K.
    Youtie, Jan
    Shapira, Philip
    Gao, Lidan
    Ma, TingTing
    [J]. SCIENTOMETRICS, 2013, 95 (03) : 1189 - 1207
  • [10] Batini C., 2006, DATA QUALITY CONCEPT