A bag-of-words approach for Drosophila gene expression pattern annotation

被引:29
作者
Ji, Shuiwang [1 ,2 ]
Li, Ying-Xin [3 ]
Zhou, Zhi-Hua [3 ]
Kumar, Sudhir [1 ,4 ]
Ye, Jieping [1 ,2 ]
机构
[1] Arizona State Univ, Ctr Evolutionary Funct Genom, Biodesign Inst, Tempe, AZ 85287 USA
[2] Arizona State Univ, Dept Comp Sci & Engn, Tempe, AZ 85287 USA
[3] Nanjing Univ, Natl Key Lab Novel Software Technol, Nanjing 210093, Peoples R China
[4] Arizona State Univ, Sch Life Sci, Tempe, AZ 85287 USA
基金
美国国家科学基金会; 美国国家卫生研究院;
关键词
EFFICIENT VISUAL-SEARCH; GLOBAL ANALYSIS; CLASSIFICATION;
D O I
10.1186/1471-2105-10-119
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Drosophila gene expression pattern images document the spatiotemporal dynamics of gene expression during embryogenesis. A comparative analysis of these images could provide a fundamentally important way for studying the regulatory networks governing development. To facilitate pattern comparison and searching, groups of images in the Berkeley Drosophila Genome Project (BDGP) high-throughput study were annotated with a variable number of anatomical terms manually using a controlled vocabulary. Considering that the number of available images is rapidly increasing, it is imperative to design computational methods to automate this task. Results: We present a computational method to annotate gene expression pattern images automatically. The proposed method uses the bag-of-words scheme to utilize the existing information on pattern annotation and annotates images using a model that exploits correlations among terms. The proposed method can annotate images individually or in groups (e. g., according to the developmental stage). In addition, the proposed method can integrate information from different two-dimensional views of embryos. Results on embryonic patterns from BDGP data demonstrate that our method significantly outperforms other methods. Conclusion: The proposed bag-of-words scheme is effective in representing a set of annotations assigned to a group of images, and the model employed to annotate images successfully captures the correlations among different controlled vocabulary terms. The integration of existing annotation information from multiple embryonic views improves annotation performance.
引用
收藏
页数:16
相关论文
共 38 条
[1]  
[Anonymous], MATRIX COMPUTATIONS
[2]  
[Anonymous], 2008, P IEEE C COMP VIS PA
[3]  
[Anonymous], 2007, ADV NEURAL INFORM PR
[4]  
[Anonymous], P IEEE C COMP VIS PA
[5]   Semi-supervised learning for the identification of syn-expressed genes from fused microarray and in situ image data [J].
Costa, Ivan G. ;
Krause, Roland ;
Opitz, Lennart ;
Schliep, Alexander .
BMC BIOINFORMATICS, 2007, 8 (Suppl 10)
[6]   Optical implementation of flip-flops using single-LCD panel [J].
Datta, Asit K. ;
Munshi, Soumika .
OPTICS AND LASER TECHNOLOGY, 2008, 40 (01) :1-5
[7]  
Fei-Fei L, 2005, PROC CVPR IEEE, P524
[8]  
Grauman K., 2007, ADV NEURAL INFORM PR, V19, P505, DOI DOI 10.7551/MITPRESS/7503.003.0068
[9]  
Grauman K, 2007, J MACH LEARN RES, V8, P725
[10]   Use of receiver operating characteristic (ROC) analysis to evaluate sequence matching [J].
Gribskov, M ;
Robinson, NL .
COMPUTERS & CHEMISTRY, 1996, 20 (01) :25-33