How big is big data?

被引:1
|
作者
Speckhard, Daniel [1 ,2 ,3 ]
Bechtel, Tim [1 ,2 ,3 ]
Ghiringhelli, Luca M. [4 ]
Kuban, Martin [1 ,2 ]
Rigamonti, Santiago [1 ,2 ]
Draxl, Claudia [1 ,2 ,3 ]
机构
[1] Humboldt Univ, Phys Dept, Zum Grossen Windkanal 2, D-12489 Berlin, Germany
[2] Humboldt Univ, CSMB, Zum Grossen Windkanal 2, D-12489 Berlin, Germany
[3] Max Planck Inst Solid State Res, Heisenbergstr 1, D-70569 Stuttgart, Germany
[4] Friedrich Alexander Univ Erlangen Nurnberg, Dept Mat Sci & Engn, Dr Mack Str 77, D-90762 Furth, Germany
关键词
D O I
10.1039/d4fd00102h
中图分类号
O64 [物理化学(理论化学)、化学物理学];
学科分类号
070304 ; 081704 ;
摘要
Big data has ushered in a new wave of predictive power using machine-learning models. In this work, we assess what big means in the context of typical materials-science machine-learning problems. This concerns not only data volume, but also data quality and veracity as much as infrastructure issues. With selected examples, we ask (i) how models generalize to similar datasets, (ii) how high-quality datasets can be gathered from heterogenous sources, (iii) how the feature set and complexity of a model can affect expressivity, and (iv) what infrastructure requirements are needed to create larger datasets and train models on them. In sum, we find that big data present unique challenges along very different aspects that should serve to motivate further work. The advent of larger datasets in materials science poses unique challenges in modeling, infrastructure, and data diversity and quality.
引用
收藏
页码:483 / 502
页数:20
相关论文
共 50 条
  • [1] How Big Data Feeds Big Crime
    Wall, David S.
    CURRENT HISTORY, 2018, 117 (795): : 29 - 34
  • [2] How big is "big" when it comes to data sets
    Bogstad, W
    COMMUNICATIONS OF THE ACM, 2001, 44 (10) : 11 - 12
  • [3] Big data of tree species distributions: how big and how good?
    Serra-Diaz, Josep M.
    Enquist, Brian J.
    Maitner, Brian
    Merow, Cory
    Svenning, Jens-C.
    FOREST ECOSYSTEMS, 2018, 4
  • [4] How Different Is Big Data?
    Chaudhuri, Surajit
    2012 IEEE 28TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2012, : 5 - 5
  • [5] How to Address Big Data
    Panchaksharaiah, Umesh
    COMMUNICATIONS OF THE ACM, 2009, 52 (12) : 7 - 7
  • [6] How 'Big Data' Is Different
    Davenport, Thomas H.
    Barth, Paul
    Bean, Randy
    MIT SLOAN MANAGEMENT REVIEW, 2012, 54 (01) : 43 - +
  • [7] How Sustainable Is Big Data?
    Corbett, Charles J.
    PRODUCTION AND OPERATIONS MANAGEMENT, 2018, 27 (09) : 1685 - 1695
  • [8] How big is "big" when it comes to data sets - Response
    Keim, D
    COMMUNICATIONS OF THE ACM, 2001, 44 (10) : 12 - 12
  • [9] HOW BIG IS BIG
    MURPHY, WJ
    INDUSTRIAL AND ENGINEERING CHEMISTRY, 1953, 45 (07): : 1385 - 1385
  • [10] How Big Is Big
    Zankowich, Paul
    EDUCATION, 1953, 73 (05): : 326 - 326