Machines first, humans second: on the importance of algorithmic interpretation of open chemistry data

被引:19
作者
Clark, Alex M. [1 ]
Williams, Antony J. [2 ]
Ekins, Sean [3 ,4 ]
机构
[1] Mol Mat Informat, Montreal, PQ H3J 2S1, Canada
[2] Royal Soc Chem, Wake Forest, NC 27587 USA
[3] Collaborat Chem, Fuquay Varina, NC 27526 USA
[4] Collaborat Drug Discovery, Burlingame, CA 94010 USA
关键词
Cheminformatics; File formats; Open lab notebooks; Public data; Machine learning; GRAPHICAL REPRESENTATION; QUALITY; TOOL; STANDARDS; DATABASES; H-1-NMR; CLIDE;
D O I
10.1186/s13321-015-0057-7
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
The current rise in the use of open lab notebook techniques means that there are an increasing number of scientists who make chemical information freely and openly available to the entire community as a series of micropublications that are released shortly after the conclusion of each experiment. We propose that this trend be accompanied by a thorough examination of data sharing priorities. We argue that the most significant immediate benefactor of open data is in fact chemical algorithms, which are capable of absorbing vast quantities of data, and using it to present concise insights to working chemists, on a scale that could not be achieved by traditional publication methods. Making this goal practically achievable will require a paradigm shift in the way individual scientists translate their data into digital form, since most contemporary methods of data entry are designed for presentation to humans rather than consumption by machine learning algorithms. We discuss some of the complex issues involved in fixing current methods, as well as some of the immediate benefits that can be gained when open data is published correctly using unambiguous machine readable formats.
引用
收藏
页数:20
相关论文
共 51 条
[1]  
[Anonymous], 1995, BASIC INORGANIC CHEM
[2]  
[Anonymous], COMMUNICATION
[3]   Utopia documents: linking scholarly literature with research data [J].
Attwood, T. K. ;
Kell, D. B. ;
McDermott, P. ;
Marsh, J. ;
Pettifer, S. R. ;
Thorne, D. .
BIOINFORMATICS, 2010, 26 (18) :i568-i574
[4]   InChI: a user's perspective [J].
Bachrach, Steven M. .
JOURNAL OF CHEMINFORMATICS, 2012, 4
[5]  
Batchelor C, 2014, LECT NOTES COMPUT SC, V8796, P98, DOI 10.1007/978-3-319-11964-9_7
[6]   Graphical representation standards for chemical structure diagrams [J].
Brecher, Jonathan .
PURE AND APPLIED CHEMISTRY, 2008, 80 (02) :277-410
[7]   Graphical representation of stereochemical configuration - (IUPAC recommendations 2006) [J].
Brecher, Jonathan .
PURE AND APPLIED CHEMISTRY, 2006, 78 (10) :1897-1970
[8]  
Clark A., 2011, REAL REASON JUNK CHE
[9]   Fast and accurate semantic annotation of bioassays exploiting a hybrid of machine learning and user confirmation [J].
Clark, Alex M. ;
Bunin, Barry A. ;
Litterman, Nadia K. ;
Schuerer, Stephan C. ;
Visser, Ubbo .
PEERJ, 2014, 2
[10]   Rendering Molecular Sketches for Publication Quality Output [J].
Clark, Alex M. .
MOLECULAR INFORMATICS, 2013, 32 (03) :291-301