Text mining in livestock animal science: Introducing the potential of text mining to animal sciences

被引:6
作者
Sahadevan, S. [2 ]
Hofmann-Apitius, M. [2 ,3 ]
Schellander, K. [4 ]
Tesfaye, D. [4 ]
Fluck, J. [3 ]
Friedrich, C. M. [1 ,3 ]
机构
[1] Univ Appl Sci & Arts, Dept Comp Sci, D-44227 Dortmund, Germany
[2] Bonn Aachen Int Ctr Informat Technol, D-53113 Bonn, Germany
[3] Fraunhofer Inst Algorithms & Sci Comp, Schloss Birlinghoven, Sankt Augustin, Germany
[4] Inst Anim Sci, Dept Anim Breeding & Husb, D-53115 Bonn, Germany
关键词
livestock genomics; preimplantation stage; ProMiner; text mining; GENE; INFORMATION; AGREEMENT;
D O I
10.2527/jas.2011-4841
中图分类号
S8 [畜牧、 动物医学、狩猎、蚕、蜂];
学科分类号
0905 ;
摘要
In biological research, establishing the prior art by searching and collecting information already present in the domain has equal importance as the experiments done. To obtain a complete overview about the relevant knowledge, researchers mainly rely on 2 major information sources: i) various biological databases and ii) scientific publications in the field. The major difference between the 2 information sources is that information from databases is available, typically well structured and condensed. The information content in scientific literature is vastly unstructured; that is, dispersed among the many different sections of scientific text. The traditional method of information extraction from scientific literature occurs by generating a list of relevant publications in the field of interest and manually scanning these texts for relevant information, which is very time consuming. It is more than likely that in using this "classical" approach the researcher misses some relevant information mentioned in the literature or has to go through biological databases to extract further information. Text mining and named entity recognition methods have already been used in human genomics and related fields as a solution to this problem. These methods can process and extract information from large volumes of scientific text. Text mining is defined as the automatic extraction of previously unknown and potentially useful information from text. Named entity recognition (NER) is defined as the method of identifying named entities (names of real world objects; for example, gene/protein names, drugs, enzymes) in text. In animal sciences, text mining and related methods have been briefly used in murine genomics and associated fields, leaving behind other fields of animal sciences, such as livestock genomics. The aim of this work was to develop an information retrieval platform in the livestock domain focusing on livestock publications and the recognition of relevant data from cattle and pigs. For this purpose, the rather noncomprehensive resources of pig and cattle gene and protein terminologies were enriched with orthologue synonyms, integrated in the NER platform, ProMiner, which is successfully used in human genomics domain. Based on the performance tests done, the present system achieved a fair performance with precision 0.64, recall 0.74, and F-1 measure of 0.69 in a test scenario based on cattle literature.
引用
收藏
页码:3666 / 3676
页数:11
相关论文
共 25 条
  • [1] [Anonymous], INT J COMPUTER APPL
  • [2] [Anonymous], 2010, Proceedings of the First International Workshop on Web Science and Information Exchange in the Medical Web (MedEx 2010). Raleigh
  • [3] [Anonymous], 9 EUR C COMP BIOL GH
  • [4] [Anonymous], EDBT 2009
  • [5] [Anonymous], 2 BIOCREATLVE CHALL
  • [6] [Anonymous], 2006, Proc. 2006 Conf. North Am. Chapter Assoc. Comput. Linguist. Hum. Lang. Technol. companion, DOI DOI 10.3115/1225785.1225791
  • [7] [Anonymous], BIOINFORMATICS
  • [8] Apweiler R, 2004, NUCLEIC ACIDS RES, V32, pD115, DOI [10.1093/nar/gkh131, 10.1093/nar/gkw1099]
  • [9] EMAGE: a spatial database of gene expression patterns during mouse embryo development
    Christiansen, Jeffrey H.
    Yang, Yiya
    Venkataraman, Shanmugasundaram
    Richardson, Lorna
    Stevenson, Peter
    Burton, Nicholas
    Baldock, Richard A.
    Davidson, Duncan R.
    [J]. NUCLEIC ACIDS RESEARCH, 2006, 34 : D637 - D641
  • [10] Ensembl 2002: accommodating comparative genomics
    Clamp, M
    Andrews, D
    Barker, D
    Bevan, P
    Cameron, G
    Chen, Y
    Clark, L
    Cox, T
    Cuff, J
    Curwen, V
    Down, T
    Durbin, R
    Eyras, E
    Gilbert, J
    Hammond, M
    Hubbard, T
    Kasprzyk, A
    Keefe, D
    Lehvaslaiho, H
    Iyer, V
    Melsopp, C
    Mongin, E
    Pettett, R
    Potter, S
    Rust, A
    Schmidt, E
    Searle, S
    Slater, G
    Smith, J
    Spooner, W
    Stabenau, A
    Stalker, J
    Stupka, E
    Ureta-Vidal, A
    Vastrik, I
    Birney, E
    [J]. NUCLEIC ACIDS RESEARCH, 2003, 31 (01) : 38 - 42