A new data science research program: evaluation, metrology, standards, and community outreach

被引:0
作者
Dorr B.J. [1 ]
Greenberg C.S. [1 ]
Fontana P. [1 ]
Przybocki M. [1 ]
Le Bras M. [1 ]
Ploehn C. [1 ]
Aulov O. [1 ]
Michel M. [1 ]
Golden E.J. [1 ]
Chang W. [1 ]
机构
[1] 100 Bureau Drive, Mail Stop 8940, Gaithersburg, 20899, MD
关键词
Data analytics; Data science evaluation series; Data science measurements; Data science metrics; Data science standards;
D O I
10.1007/s41060-016-0016-z
中图分类号
学科分类号
摘要
This article examines foundational issues in data science including current challenges, basic research questions, and expected advances, as the basis for a new data science research program (DSRP) and associated data science evaluation (DSE) series, introduced by the National Institute of Standards and Technology (NIST) in the fall of 2015. The DSRP is designed to facilitate and accelerate research progress in the field of data science and consists of four components: evaluation and metrology, standards, compute infrastructure, and community outreach. A key part of the evaluation and measurement component is the DSE. The DSE series aims to address logistical and evaluation design challenges while providing rigorous measurement methods and an emphasis on generalizability rather than domain- and application-specific approaches. Toward that end, each year the DSE will consist of multiple research tracks and will encourage the application of tasks that span these tracks. The evaluations are intended to facilitate research efforts and collaboration, leverage shared infrastructure, and effectively address crosscutting challenges faced by diverse data science communities. Multiple research tracks will be championed by members of the data science community with the goal of enabling rigorous comparison of approaches through common tasks, datasets, metrics, and shared research challenges. The tracks will permit us to measure several different data science technologies in a wide range of fields and will address computing infrastructure, standards for an interoperability framework, and domain-specific examples. This article also summarizes lessons learned from the data science evaluation series pre-pilot that was held in fall of 2015. © 2016, Springer International Publishing Switzerland (outside the USA).
引用
收藏
页码:177 / 197
页数:20
相关论文
共 108 条
  • [1] Dorr B.J., Greenberg C.S., Fontana P., Przybocki M., Le Bras M., Ploehn C., Aulov O., Michel M., Golden E.J., Chang W., The NIST data science initiative, IEEE International Conference on Data Science and Advanced Analytics (DSAA), pp. 1-10, (2015)
  • [2] Dorr B., Greenberg C., Fontana P., Przybocki M., Le Bras M., Ploehn C., Aulov O., Chang W., The NIST IAD data science evaluation series: Part of the NIST information access division data science research program, Proceedings of IEEE Bigdata 2015, pp. 2575-2577, (2015)
  • [3] Smith M., The White House Names Dr. D.J. Patil as the First U.S. Chief Data Scientist, (2015)
  • [4] Cao L., Motoda H., Karypis G., Boethals B., DSAA trends and controversies, International Conference on Data Science and Advanced Analytics (DSAA), (2014)
  • [5] Yang S., Kalpakis K., Biem A., Detecting road traffic events by coupling multiple timeseries with a nonparametric bayesian method, IEEE Trans. Intell. Transp. Syst., 15, 5, (2014)
  • [6] Chandola V., Banerjee A., Kumar V., Anomaly detection: a survey, ACM Comput. Surv. (CSUR), 41, 3, (2009)
  • [7] Fagin R., Haas L., Hernandez M., Miller R.J., Popa L., Velegrakis Y., Conceptual Modeling: Foundations and Applications, (2009)
  • [8] Getoor L., Machanavajjhala A., Entity resolution: theory, practice & open challenges, Proc. VLDB Endow., 5, 12, (2012)
  • [9] Christen P., Data Matching: Concepts and Techniques for Record Linkage, Entity Resolution, and Duplicate Detection, Data-Centric Systems and Applications, (2012)
  • [10] Sleeman J., Finin T., Joshi A., Entity type recognition for heterogeneous semantic graphs, 2013 AAAI Fall Symposium Series, (2013)