BioCreative III interactive task: an overview

被引:38
作者
Arighi, Cecilia N. [1 ]
Roberts, Phoebe M. [2 ]
Agarwal, Shashank [3 ]
Bhattacharya, Sanmitra [4 ]
Cesareni, Gianni [5 ,6 ]
Chatr-aryamontri, Andrew [7 ]
Clematide, Simon [8 ]
Gaudet, Pascale [9 ,10 ]
Giglio, Michelle Gwinn [11 ]
Harrow, Ian [2 ]
Huala, Eva [12 ]
Krallinger, Martin [13 ]
Leser, Ulf [14 ]
Li, Donghui [12 ]
Liu, Feifan [3 ]
Lu, Zhiyong [15 ]
Maltais, Lois J. [16 ]
Okazaki, Naoaki [17 ]
Perfetto, Livia [5 ]
Rinaldi, Fabio [8 ]
Saetre, Rune [17 ,18 ]
Salgado, David [19 ,20 ]
Srinivasan, Padmini [4 ]
Thomas, Philippe E. [14 ]
Toldo, Luca [21 ]
Hirschman, Lynette [22 ]
Wu, Cathy H. [1 ]
机构
[1] Univ Delaware, Ctr Bioinformat & Computat Biol, Newark, DE 19716 USA
[2] Pfizer Res Technol Ctr, Cambridge, MA USA
[3] Univ Wisconsin, Milwaukee, WI 53201 USA
[4] Univ Iowa, Dept Comp Sci, Iowa City, IA 52242 USA
[5] Univ Roma Tor Vergata, Rome, Italy
[6] IRCCS Fdn Santa Lucia, Rome, Italy
[7] Univ Edinburgh, Wellcome Trust Ctr Cell Biol, Edinburgh EH8 9YL, Midlothian, Scotland
[8] Univ Zurich, Inst Computat Linguist, Zurich, Switzerland
[9] Swiss Inst Bioinformat, CALIPHO Grp, Geneva, Switzerland
[10] Northwestern Univ, NIBIC, DictyBase, Chicago, IL 60611 USA
[11] Univ Maryland, Baltimore, MD 21201 USA
[12] Carnegie Inst Sci, TAIR, Washington, DC USA
[13] Spanish Natl Canc Res Ctr CNIO, Struct & Computat Biol Grp, Madrid, Spain
[14] Univ Berlin, D-10099 Berlin, Germany
[15] Natl Lib Med, Natl Ctr Biotechnol Informat, Bethesda, MD 20894 USA
[16] Jackson Lab, MGI, Bar Harbor, ME 04609 USA
[17] Univ Tokyo, Dept Comp Sci, Tokyo 1138654, Japan
[18] NTNU, Dept Comp & Informat Sci, Trondheim, Norway
[19] Monash Univ, Australian Regenerat Med Inst, Melbourne, Vic 3004, Australia
[20] Univ Mediterranee, Dev Biol Inst Marseille Luminy IBDML, Marseille, France
[21] Merck KGaA, Darmstadt, Germany
[22] Mitre Corp, Informat Technol Ctr, Bedford, MA 01730 USA
来源
BMC BIOINFORMATICS | 2011年 / 12卷
基金
美国国家科学基金会; 瑞士国家科学基金会;
关键词
GENE; IDENTIFICATION; EXTRACTION; ONTOLOGY; MENTIONS; DATABASE; VERSION; SYSTEM;
D O I
10.1186/1471-2105-12-S8-S4
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: The BioCreative challenge evaluation is a community-wide effort for evaluating text mining and information extraction systems applied to the biological domain. The biocurator community, as an active user of biomedical literature, provides a diverse and engaged end user group for text mining tools. Earlier BioCreative challenges involved many text mining teams in developing basic capabilities relevant to biological curation, but they did not address the issues of system usage, insertion into the workflow and adoption by curators. Thus in BioCreative III (BC-III), the InterActive Task (IAT) was introduced to address the utility and usability of text mining tools for real-life biocuration tasks. To support the aims of the IAT in BC-III, involvement of both developers and end users was solicited, and the development of a user interface to address the tasks interactively was requested. Results: A User Advisory Group (UAG) actively participated in the IAT design and assessment. The task focused on gene normalization (identifying gene mentions in the article and linking these genes to standard database identifiers), gene ranking based on the overall importance of each gene mentioned in the article, and gene-oriented document retrieval (identifying full text papers relevant to a selected gene). Six systems participated and all processed and displayed the same set of articles. The articles were selected based on content known to be problematic for curation, such as ambiguity of gene names, coverage of multiple genes and species, or introduction of a new gene name. Members of the UAG curated three articles for training and assessment purposes, and each member was assigned a system to review. A questionnaire related to the interface usability and task performance (as measured by precision and recall) was answered after systems were used to curate articles. Although the limited number of articles analyzed and users involved in the IAT experiment precluded rigorous quantitative analysis of the results, a qualitative analysis provided valuable insight into some of the problems encountered by users when using the systems. The overall assessment indicates that the system usability features appealed to most users, but the system performance was suboptimal (mainly due to low accuracy in gene normalization). Some of the issues included failure of species identification and gene name ambiguity in the gene normalization task leading to an extensive list of gene identifiers to review, which, in some cases, did not contain the relevant genes. The document retrieval suffered from the same shortfalls. The UAG favored achieving high performance (measured by precision and recall), but strongly recommended the addition of features that facilitate the identification of correct gene and its identifier, such as contextual information to assist in disambiguation. Discussion: The IAT was an informative exercise that advanced the dialog between curators and developers and increased the appreciation of challenges faced by each group. A major conclusion was that the intended users should be actively involved in every phase of software development, and this will be strongly encouraged in future tasks. The IAT Task provides the first steps toward the definition of metrics and functional requirements that are necessary for designing a formal evaluation of interactive curation systems in the BioCreative IV challenge.
引用
收藏
页数:21
相关论文
共 41 条
  • [1] [Anonymous], NATURE PRECEDINGS
  • [2] [Anonymous], 2009, SEARCH USER INTERFAC, DOI DOI 10.1017/CBO9781139644082
  • [3] [Anonymous], NCBI TAX BROWS
  • [4] [Anonymous], PAC S BIOCOMPUT
  • [5] [Anonymous], P BIOCR 3 WORKSH BET
  • [6] The Universal Protein Resource (UniProt) in 2010
    Apweiler, Rolf
    Martin, Maria Jesus
    O'Donovan, Claire
    Magrane, Michele
    Alam-Faruque, Yasmin
    Antunes, Ricardo
    Barrell, Daniel
    Bely, Benoit
    Bingley, Mark
    Binns, David
    Bower, Lawrence
    Browne, Paul
    Chan, Wei Mun
    Dimmer, Emily
    Eberhardt, Ruth
    Fedotov, Alexander
    Foulger, Rebecca
    Garavelli, John
    Huntley, Rachael
    Jacobsen, Julius
    Kleen, Michael
    Laiho, Kati
    Leinonen, Rasko
    Legge, Duncan
    Lin, Quan
    Liu, Wudong
    Luo, Jie
    Orchard, Sandra
    Patient, Samuel
    Poggioli, Diego
    Pruess, Manuela
    Corbett, Matt
    di Martino, Giuseppe
    Donnelly, Mike
    van Rensburg, Pieter
    Bairoch, Amos
    Bougueleret, Lydie
    Xenarios, Ioannis
    Altairac, Severine
    Auchincloss, Andrea
    Argoud-Puy, Ghislaine
    Axelsen, Kristian
    Baratin, Delphine
    Blatter, Marie-Claude
    Boeckmann, Brigitte
    Bolleman, Jerven
    Bollondi, Laurent
    Boutet, Emmanuel
    Quintaje, Silvia Braconi
    Breuza, Lionel
    [J]. NUCLEIC ACIDS RESEARCH, 2010, 38 : D142 - D148
  • [7] Gene Ontology: tool for the unification of biology
    Ashburner, M
    Ball, CA
    Blake, JA
    Botstein, D
    Butler, H
    Cherry, JM
    Davis, AP
    Dolinski, K
    Dwight, SS
    Eppig, JT
    Harris, MA
    Hill, DP
    Issel-Tarver, L
    Kasarskis, A
    Lewis, S
    Matese, JC
    Richardson, JE
    Ringwald, M
    Rubin, GM
    Sherlock, G
    [J]. NATURE GENETICS, 2000, 25 (01) : 25 - 29
  • [8] Utopia documents: linking scholarly literature with research data
    Attwood, T. K.
    Kell, D. B.
    McDermott, P.
    Marsh, J.
    Pettifer, S. R.
    Thorne, D.
    [J]. BIOINFORMATICS, 2010, 26 (18) : i568 - i574
  • [9] Bairoch A., 2009, Nature Precedings
  • [10] Bhattacharya S., 2010, Proceedings of the BioCreative III workshop, P55