Big data management challenges in health research-a literature review

被引:39
作者
Wang, Xiaoming [1 ]
Williams, Carolyn [1 ]
Liu, Zhen Hua [2 ]
Croghan, Joe [3 ]
机构
[1] NIAID, NIH, 5601 Fishers Lane, Rockville, MD 20852 USA
[2] Oracle Corp, Redwood City, CA USA
[3] NIAID, Software Engn, Rockville, MD USA
关键词
big data management; system performance; data quality; machine learning; SQL and NoSQL; GENETIC ARCHITECTURE; CLINICAL-RESEARCH; BLOOD-PRESSURE; BECKMAN REPORT; DATA SCIENCE; GENOMIC DATA; ENTITY; INFORMATION; ATTRIBUTE; BIOLOGY;
D O I
10.1093/bib/bbx086
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Big data management for information centralization (i.e. making data of interest findable) and integration (i.e. making related data connectable) in health research is a defining challenge in biomedical informatics. While essential to create a foundation for knowledge discovery, optimized solutions to deliver high-quality and easy-to-use information resources are not thoroughly explored. In this review, we identify the gaps between current data management approaches and the need for new capacity to manage big data generated in advanced health research. Focusing on these unmet needs and well-recognized problems, we introduce state-of-the-art concepts, approaches and technologies for data management from computing academia and industry to explore improvement solutions. We explain the potential and significance of these advances for biomedical informatics. In addition, we discuss specific issues that have a great impact on technical solutions for developing the next generation of digital products (tools and data) to facilitate the raw-data-to-knowledge process in health research.
引用
收藏
页码:156 / 167
页数:12
相关论文
共 156 条
[31]   caCORE: A common infrastructure for cancer informatics [J].
Covitz, PA ;
Hartel, F ;
Schaefer, C ;
De Coronado, S ;
Fragoso, G ;
Sahni, H ;
Gustafson, S ;
Buetow, KH .
BIOINFORMATICS, 2003, 19 (18) :2404-2412
[32]   The Snowflake Elastic Data Warehouse [J].
Dageville, Benoit ;
Cruanes, Thierry ;
Zukowski, Marcin ;
Antonov, Vadim ;
Avanes, Artin ;
Bock, Jon ;
Claybaugh, Jonathan ;
Engovatov, Daniel ;
Hentschel, Martin ;
Huang, Jiansheng ;
Lee, Allison W. ;
Motivala, Ashish ;
Munir, Abdul Q. ;
Pelley, Steven ;
Povinec, Peter ;
Rahn, Greg ;
Triantafyllis, Spyridon ;
Unterbrunner, Philipp .
SIGMOD'16: PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2016, :215-226
[33]  
De Sa C, 2016, SIGMOD REC, V45, P60, DOI [10.1145/2949741.2949756, 10.1145/3060586]
[34]  
Dean J, 2004, USENIX ASSOCIATION PROCEEDINGS OF THE SIXTH SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION (OSDE '04), P137
[35]   Large-Scale Deep Learning for Building Intelligent Computer Systems [J].
Dean, Jeff .
PROCEEDINGS OF THE NINTH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING (WSDM'16), 2016, :1-1
[36]  
DeCandia Giuseppe, 2007, Operating Systems Review, V41, P205, DOI 10.1145/1323293.1294281
[37]   Data Science and Prediction [J].
Dhar, Vasant .
COMMUNICATIONS OF THE ACM, 2013, 56 (12) :64-73
[38]   Pivoting approaches for bulk extraction of Entity-Attribute-Value data [J].
Dinu, V ;
Nadkarni, P ;
Brandt, C .
COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 2006, 82 (01) :38-43
[39]  
Doan A, 2005, AI MAG, V26, P83
[40]  
Doan AnHai., 2001, ACM Sigmod Record, V30, P509, DOI DOI 10.1145/375663.375731