Data Quality Issues in Big Data

被引:0
作者
Rao, Dhana [1 ]
Gudivada, Venkat N. [2 ]
Raghavan, Vijay V. [3 ]
机构
[1] East Carolina Univ, Dept Biol, Greenville, NC 27858 USA
[2] East Carolina Univ, Dept Comp Sci, Greenville, NC USA
[3] Univ Louisiana Lafayette, Ctr Adv Comp Studies, Lafayette, LA 70504 USA
来源
PROCEEDINGS 2015 IEEE INTERNATIONAL CONFERENCE ON BIG DATA | 2015年
关键词
Data quality; big data; biological data; information quality; WEB SERVICES; BIOINFORMATICS; INTEROPERABILITY; INTEGRATION; ONTOLOGIES; MANAGEMENT; WAREHOUSE; MASHUP;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Though the issues of data quality trace back their origin to the early days of computing, the recent emergence of Big Data has added more dimensions. Furthermore, given the range of Big Data applications, potential consequences of bad data quality can be for more disastrous and widespread. This paper provides a perspective on data quality issues in the Big Data context. It also discusses data integration issues that arise in biological databases and attendant data quality issues.
引用
收藏
页码:2654 / 2660
页数:7
相关论文
共 39 条
[1]  
[Anonymous], J DATA INFORM QUALIT
[2]  
[Anonymous], VLDB
[3]  
[Anonymous], 2015, Handbook of statistics, DOI DOI 10.1016/B978-0-444-63492-4.00009-5
[4]   Biological knowledge management: the emerging role of the Semantic Web technologies [J].
Antezana, Erick ;
Kuiper, Martin ;
Mironov, Vladimir .
BRIEFINGS IN BIOINFORMATICS, 2009, 10 (04) :392-407
[5]   Gene Ontology: tool for the unification of biology [J].
Ashburner, M ;
Ball, CA ;
Blake, JA ;
Botstein, D ;
Butler, H ;
Cherry, JM ;
Davis, AP ;
Dolinski, K ;
Dwight, SS ;
Eppig, JT ;
Harris, MA ;
Hill, DP ;
Issel-Tarver, L ;
Kasarskis, A ;
Lewis, S ;
Matese, JC ;
Richardson, JE ;
Ringwald, M ;
Rubin, GM ;
Sherlock, G .
NATURE GENETICS, 2000, 25 (01) :25-29
[6]   Bio2RDF: Towards a mashup to build bioinformatics knowledge systems [J].
Belleau, Francois ;
Nolin, Marc-Alexandre ;
Tourigny, Nicole ;
Rigault, Philippe ;
Morissette, Jean .
JOURNAL OF BIOMEDICAL INFORMATICS, 2008, 41 (05) :706-716
[7]   Unlocking the secrets of the genome [J].
Celniker, Susan E. ;
Dillon, Laura A. L. ;
Gerstein, Mark B. ;
Gunsalus, Kristin C. ;
Henikoff, Steven ;
Karpen, Gary H. ;
Kellis, Manolis ;
Lai, Eric C. ;
Lieb, Jason D. ;
MacAlpine, David M. ;
Micklem, Gos ;
Piano, Fabio ;
Snyder, Michael ;
Stein, Lincoln ;
White, Kevin P. ;
Waterston, Robert H. .
NATURE, 2009, 459 (7249) :927-930
[8]   Descriptive and Prescriptive Data Cleaning [J].
Chalamalla, Anup ;
Ilyas, Ihab F. ;
Ouzzani, Mourad ;
Papotti, Paolo .
SIGMOD'14: PROCEEDINGS OF THE 2014 ACM SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2014, :445-456
[9]   HCLS 2.0/3.0: Health care and life sciences data mashup using Web 2.0/3.0 [J].
Cheung, Kei-Hoi ;
Yip, Kevin Y. ;
Townsend, Jeffrey P. ;
Scotch, Matthew .
JOURNAL OF BIOMEDICAL INFORMATICS, 2008, 41 (05) :694-705
[10]   Creating a General (Family) Practice Epidemiological Database in Ireland - Data Quality Issue Management [J].
Collins, Claire ;
Janssens, Kelly .
ACM JOURNAL OF DATA AND INFORMATION QUALITY, 2012, 4 (01)