Identifiers for the 21st century: How to design, provision, and reuse persistent identifiers to maximize utility and impact of life science data

被引:71
作者
McMurry, Julie A. [1 ,2 ]
Juty, Nick [3 ]
Blomberg, Niklas [4 ]
Burdett, Tony [3 ]
Conlin, Tom [1 ,2 ]
Conte, Nathalie [3 ]
Courtot, Melanie [3 ]
Deck, John [5 ]
Dumontier, Michel [6 ]
Fellows, Donal K. [7 ]
Gonzalez-Beltran, Alejandra [8 ]
Gormanns, Philipp [9 ]
Grethe, Jeffrey [10 ]
Hastings, Janna [11 ]
Heriche, Jean-Karim [12 ]
Hermjakob, Henning [3 ]
Ison, Jon C. [13 ]
Jimenez, Rafael C. [3 ]
Jupp, Simon [3 ]
Kunze, John [14 ]
Laibe, Camille [3 ]
Le Novere, Nicolas [11 ]
Malone, James [3 ]
Martin, Maria Jesus [3 ]
McEntyre, Johanna R. [3 ]
Morris, Chris [15 ]
Muilu, Juha [16 ,17 ]
Mueller, Wolfgang [18 ]
Rocca-Serra, Philippe [8 ]
Sansone, Susanna-Assunta [8 ]
Sariyar, Murat [19 ]
Snoep, Jacky L. [20 ,21 ]
Soiland-Reyes, Stian [7 ]
Stanford, Natalie J. [7 ]
Swainston, Neil [22 ]
Washington, Nicole [23 ]
Williams, Alan R. [7 ]
Wimalaratne, Sarala M. [3 ]
Winfree, Lilly M. [1 ,2 ]
Wolstencroft, Katherine [24 ]
Goble, Carole [7 ]
Mungall, Christopher J. [23 ]
Haendel, Melissa A. [1 ,2 ]
Parkinson, Helen [3 ]
机构
[1] Oregon Hlth & Sci Univ, Dept Med Informat & Epidemiol, Portland, OR 97201 USA
[2] Oregon Hlth & Sci Univ, OHSU Lib, Portland, OR 97201 USA
[3] European Mol Biol Lab, European Bioinformat Inst, Wellcome Genome Campus, Cambridge, England
[4] ELIXIR Hub, Wellcome Genome Campus, Cambridge, England
[5] Univ Calif Berkeley, Berkeley Nat Hist Museums, Berkeley, CA 94720 USA
[6] Maastricht Univ, Inst Data Sci, Maastricht, Netherlands
[7] Univ Manchester, Sch Comp Sci, Manchester, Lancs, England
[8] Univ Oxford, Oxford E Res Ctr, Oxford, England
[9] German Res Ctr Environm Hlth, Inst Expt Genet, Helmholtz Ctr Munich, Neuherberg, Germany
[10] Univ Calif San Diego, Ctr Res Biol Syst, La Jolla, CA 92093 USA
[11] Babraham Inst, Cambridge, England
[12] European Mol Biol Lab, Heidelberg, Germany
[13] Tech Univ Denmark, Dept Syst Biol, Ctr Biol Sequence Anal, Lyngby, Denmark
[14] Calif Digital Lib, Oakland, CA USA
[15] Daresbury Lab, Sci & Technol Facil Council, Warrington, Cheshire, England
[16] Univ Groningen, Univ Med Ctr Groningen, Dept Genet, Genom Coordinat Ctr, Groningen, Netherlands
[17] Univ Groningen, Groningen Bioinformat Ctr, Groningen, Netherlands
[18] Heidelberg Inst Theoret Studies, Sci Databases & Visualizat, Heidelberg, Germany
[19] Bern Univ Appl Sci Engn & Informat Technol, Inst Med Informat, Bern, Switzerland
[20] Univ Manchester, Manchester Inst Biol, Manchester, Lancs, England
[21] Stellenbosch Univ, Dept Biochem, Stellenbosch, South Africa
[22] Univ Manchester, Manchester Ctr Synthet Biol Fine & Special Chem, Manchester, Lancs, England
[23] Lawrence Berkeley Natl Lab, Environm Genom & Syst Biol, Berkeley, CA USA
[24] Leiden Univ, Leiden Inst Adv Comp Sci, Leiden, Netherlands
基金
英国生物技术与生命科学研究理事会;
关键词
GENE NAME ERRORS; ONTOLOGIES; COMMUNITY;
D O I
10.1371/journal.pbio.2001414
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
In many disciplines, data are highly decentralized across thousands of online databases (repositories, registries, and knowledgebases). Wringing value from such databases depends on the discipline of data science and on the humble bricks and mortar that make integration possible; identifiers are a core component of this integration infrastructure. Drawing on our experience and on work by other groups, we outline 10 lessons we have learned about the identifier qualities and best practices that facilitate large-scale data integration. Specifically, we propose actions that identifier practitioners (database providers) should take in the design, provision and reuse of identifiers. We also outline the important considerations for those referencing identifiers in various circumstances, including by authors and data generators. While the importance and relevance of each lesson will vary by context, there is a need for increased awareness about how to avoid and manage common identifier problems, especially those related to persistence and web-accessibility/resolvability. We focus strongly on web-based identifiers in the life sciences; however, the principles are broadly relevant to other disciplines.
引用
收藏
页数:18
相关论文
共 32 条
  • [1] Altman M, 2013, IASSIST Q, P37
  • [2] [Anonymous], 2014, FAIR DAT PRINC
  • [3] [Anonymous], 1998, COOL URIS DONT CHANG
  • [4] Bandrowski Anita, 2015, F1000Res, V4, P134, DOI 10.12688/f1000research.6555.1
  • [5] Berners-Lee T., 1993, UNIFORM RESOURCE LOC
  • [6] Birbeck M, 2009, CURIE SYNTAX 1 0 W3C
  • [7] Bradner S. O., 1997, Key words for use in RFCs to Indicate Requirement Levels
  • [8] Bugeja Michael., 2010, Vanishing Act: The Erosion of Online Footnotes and Implications for Scholarship in the Digital Age
  • [9] Emmert-Streib F, 2016, FRONT GENET FRONTIER, V7
  • [10] Community Next Steps for Making Globally Unique Identifiers Work for Biocollections Data
    Guralnick, Robert P.
    Cellinese, Nico
    Deck, John
    Pyle, Richard L.
    Kunze, John
    Penev, Lyubomir
    Walls, Ramona
    Hagedorn, Gregor
    Agosti, Donat
    Wieczorek, John
    Catapano, Terry
    Page, Roderic D. M.
    [J]. ZOOKEYS, 2015, (494) : 133 - 154