Evaluation of a large-scale biomedical data annotation initiative

被引:8
|
作者
Lacson, Ronilda [1 ]
Pitzer, Erik [2 ]
Hinske, Christian [1 ]
Galante, Pedro [3 ]
Ohno-Machado, Lucila [1 ]
机构
[1] Harvard Univ, Brigham & Womens Hosp, Sch Med, Decis Syst Grp, Boston, MA 02115 USA
[2] Upper Austria Univ Appl Sci, Hagenberg, Austria
[3] Ludwig Inst Canc Res, Sao Paolo Branch, Sao Paulo, Brazil
来源
BMC BIOINFORMATICS | 2009年 / 10卷
关键词
GEO;
D O I
10.1186/1471-2105-10-S9-S10
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: This study describes a large-scale manual re-annotation of data samples in the Gene Expression Omnibus (GEO), using variables and values derived from the National Cancer Institute thesaurus. A framework is described for creating an annotation scheme for various diseases that is flexible, comprehensive, and scalable. The annotation structure is evaluated by measuring coverage and agreement between annotators. Results: There were 12,500 samples annotated with approximately 30 variables, in each of six disease categories-breast cancer, colon cancer, inflammatory bowel disease (IBD), rheumatoid arthritis (RA), systemic lupus erythematosus (SLE), and Type 1 diabetes mellitus (DM). The annotators provided excellent variable coverage, with known values for over 98% of three critical variables: disease state, tissue, and sample type. There was 89% strict inter-annotator agreement and 92% agreement when using semantic and partial similarity measures. Conclusion: We show that it is possible to perform manual re-annotation of a large repository in a reliable manner.
引用
收藏
页数:6
相关论文
共 50 条
  • [1] Evaluation of a large-scale biomedical data annotation initiative
    Ronilda Lacson
    Erik Pitzer
    Christian Hinske
    Pedro Galante
    Lucila Ohno-Machado
    BMC Bioinformatics, 10
  • [2] Semantic Annotation in the Biomedical Domain: Large-scale Classification and BioASQ
    Gaussier, Eric
    COMPANION PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE 2018 (WWW 2018), 2018, : 1045 - 1045
  • [3] Large-Scale Machine Learning Algorithms for Biomedical Data Science
    Huang, Heng
    ACM-BCB'19: PROCEEDINGS OF THE 10TH ACM INTERNATIONAL CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY AND HEALTH INFORMATICS, 2019, : 4 - 4
  • [4] Multiclass Classification Problem of Large-Scale Biomedical Meta Data
    Student, Sebastian
    Pieter, Justyna
    Fujarewicz, Krzysztof
    9TH INTERNATIONAL CONFERENCE INTERDISCIPLINARITY IN ENGINEERING, INTER-ENG 2015, 2016, 22 : 938 - 945
  • [5] A STRUCTURAL EVALUATION OF A LARGE-SCALE QUASI-EXPERIMENTAL MICROFINANCE INITIATIVE
    Kaboski, Joseph P.
    Townsend, Robert M.
    ECONOMETRICA, 2011, 79 (05) : 1357 - 1406
  • [6] Collaborative Mining and Interpretation of Large-Scale Data for Biomedical Research Insights
    Tsiliki, Georgia
    Karacapilidis, Nikos
    Christodoulou, Spyros
    Tzagarakis, Manolis
    PLOS ONE, 2014, 9 (09):
  • [7] Large-scale annotation of proteins with labelling methods
    Casadio, R.
    Martelli, P. L.
    Savojardo, C.
    Fariselli, P.
    NUOVO CIMENTO C-COLLOQUIA AND COMMUNICATIONS IN PHYSICS, 2012, 35 (05): : 7 - 25
  • [8] Large-Scale Training Framework for Video Annotation
    Hwang, Seong Jae
    Lee, Joonseok
    Varadarajan, Balakrishnan
    Gordon, Ariel
    Xu, Zheng
    Natsev, Apostol
    KDD'19: PROCEEDINGS OF THE 25TH ACM SIGKDD INTERNATIONAL CONFERENCCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2019, : 2394 - 2402
  • [9] Data processing and evaluation for large-scale proteome profile
    Wu, S.
    Ying, W.
    Zhang, J.
    Xue, X.
    Qian, X.
    Zhu, Y.
    He, F.
    MOLECULAR & CELLULAR PROTEOMICS, 2006, 5 (10) : S121 - S121
  • [10] Annotation Tool Development for Large-Scale Corpus Creation Projects at the Linguistic Data Consortium
    Maeda, Kazuaki
    Lee, Haejoong
    Medero, Shawn
    Medero, Julie
    Parker, Robert
    Strassel, Stephanie
    SIXTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, LREC 2008, 2008, : 3052 - 3056