An equivalence test between features lists, based on the Sorensen-Dice index and the joint frequencies of GO term enrichment

被引:1
作者
Flores, Pablo [1 ,4 ]
Salicru, Miquel [2 ]
Sanchez-Pla, Alex [2 ,3 ]
Ocana, Jordi [2 ]
机构
[1] Escuela Super Politecn Chimborazo ESPOCH, Res Grp Data Sci CIDED, Panamer Sur Km 1 1-2, Riobamba, Ecuador
[2] Univ Barcelona, Stat Sect, Dept Genet Microbiol & Stat, Ave Diagonal 643, Barcelona 08028, Spain
[3] Vall dHebron Inst Res VHIR, Stat & Bioinformat Unit, Vall dHebron 119-129, Barcelona 08035, Spain
[4] Univ Politecn Cataluna, Fac Math & Stat, Dept Stat & Operat Res, Barcelona, Spain
关键词
Delta method; Bootstrap; Simulation; Type I error; Irrelevance of dissimilarity; Gene lists; SEMANTIC SIMILARITY; GENE ONTOLOGY; R PACKAGE;
D O I
10.1186/s12859-022-04739-2
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background In integrative bioinformatic analyses, it is of great interest to stablish the equivalence between gene or (more in general) feature lists, up to a given level and in terms of their annotations in the Gene Ontology. The aim of this article is to present an equivalence test based on the proportion of GO terms which are declared as enriched in both lists simultaneously. Results On the basis of these data, the dissimilarity between gene lists is measured by means of the Sorensen-Dice index. We present two flavours of the same test: One of them based on the asymptotic normality of the test statistic and the other based on the bootstrap method. Conclusions The accuracy of these tests is studied by means of simulation and their possible interest is illustrated by using them over two real datasets: A collection of gene lists related to cancer and a collection of gene lists related to kidney rejection after transplantation.
引用
收藏
页数:21
相关论文
共 29 条
  • [1] Gene Ontology: tool for the unification of biology
    Ashburner, M
    Ball, CA
    Blake, JA
    Botstein, D
    Butler, H
    Cherry, JM
    Davis, AP
    Dolinski, K
    Dwight, SS
    Eppig, JT
    Harris, MA
    Hill, DP
    Issel-Tarver, L
    Kasarskis, A
    Lewis, S
    Matese, JC
    Richardson, JE
    Ringwald, M
    Rubin, GM
    Sherlock, G
    [J]. NATURE GENETICS, 2000, 25 (01) : 25 - 29
  • [2] CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING
    BENJAMINI, Y
    HOCHBERG, Y
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) : 289 - 300
  • [3] The Gene Ontology resource: enriching a GOld mine
    Carbon, Seth
    Douglass, Eric
    Good, Benjamin M.
    Unni, Deepak R.
    Harris, Nomi L.
    Mungall, Christopher J.
    Basu, Siddartha
    Chisholm, Rex L.
    Dodson, Robert J.
    Hartline, Eric
    Fey, Petra
    Thomas, Paul D.
    Albou, Laurent-Philippe
    Ebert, Dustin
    Kesling, Michael J.
    Mi, Huaiyu
    Muruganujan, Anushya
    Huang, Xiaosong
    Mushayahama, Tremayne
    LaBonte, Sandra A.
    Siegele, Deborah A.
    Antonazzo, Giulia
    Attrill, Helen
    Brown, Nick H.
    Garapati, Phani
    Marygold, Steven J.
    Trovisco, Vitor
    Dos Santos, Gil
    Falls, Kathleen
    Tabone, Christopher
    Zhou, Pinglei
    Goodman, Joshua L.
    Strelets, Victor B.
    Thurmond, Jim
    Garmiri, Penelope
    Ishtiaq, Rizwan
    Rodriguez-Lopez, Milagros
    Acencio, Marcio L.
    Kuiper, Martin
    Laegreid, Astrid
    Logie, Colin
    Lovering, Ruth C.
    Kramarz, Barbara
    Saverimuttu, Shirin C. C.
    Pinheiro, Sandra M.
    Gunn, Heather
    Su, Renzhi
    Thurlow, Katherine E.
    Chibucos, Marcus
    Giglio, Michelle
    [J]. NUCLEIC ACIDS RESEARCH, 2021, 49 (D1) : D325 - D334
  • [4] A new statistical approach for assessing similarity of species composition with incidence and abundance data
    Chao, A
    Chazdon, RL
    Colwell, RK
    Shen, TJ
    [J]. ECOLOGY LETTERS, 2005, 8 (02) : 148 - 159
  • [5] Chen JJ, 2000, DRUG INF J, V34, P569, DOI 10.1177/009286150003400225
  • [6] The limiting distributions of certain statistics
    Doob, JL
    [J]. ANNALS OF MATHEMATICAL STATISTICS, 1935, 6 : 160 - 169
  • [7] Global functional profiling of gene expression
    Draghici, S
    Khatri, P
    Martins, RP
    Ostermeier, GC
    Krawetz, SA
    [J]. GENOMICS, 2003, 81 (02) : 98 - 104
  • [8] Food U, 1992, GUID STAT PROC BIOEQ
  • [9] HOLM S, 1979, SCAND J STAT, V6, P65
  • [10] ToppCluster: a multiple gene list feature analyzer for comparative enrichment clustering and network-based dissection of biological systems
    Kaimal, Vivek
    Bardes, Eric E.
    Tabar, Scott C.
    Jegga, Anil G.
    Aronow, Bruce J.
    [J]. NUCLEIC ACIDS RESEARCH, 2010, 38 : W96 - W102