Multiscale analysis of count data through topic alignment

被引:3
作者
Fukuyama, Julia [1 ]
Sankaran, Kris [2 ]
Symul, Laura [3 ]
机构
[1] Indiana Univ, Dept Stat, 919 E 10th St, Bloomington, IN 47408 USA
[2] Univ Wisconsin, Dept Stat, 1300 Univ Ave, Madison, WI 53706 USA
[3] Stanford Univ, Dept Stat, 390 Jane Stanford Way, Stanford, CA 94305 USA
关键词
Community analysis; Microbiota; Multiresolution; Mixed membership models; Topic model;
D O I
10.1093/biostatistics/kxac018
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Topic modeling is a popular method used to describe biological count data. With topic models, the user must specify the number of topics K. Since there is no definitive way to choose K and since a true value might not exist, we develop a method, which we call topic alignment, to study the relationships across models with different K. In addition, we present three diagnostics based on the alignment. These techniques can show how many topics are consistently present across different models, if a topic is only transiently present, or if a topic splits into more topics when K increases. This strategy gives more insight into the process of generating the data than choosing a single value of K would. We design a visual representation of these cross-model relationships, show the effectiveness of these tools for interpreting the topics on simulated and real data, and release an accompanying R package, alto.
引用
收藏
页码:1045 / 1065
页数:21
相关论文
共 24 条
  • [1] Airoldi E. M., 2014, Handbook of Mixed Membership Models and Their Applications, P3, DOI DOI 10.1201/B17520-8
  • [2] Inference and visualization of DNA damage patterns using a grade of membership model
    Al-Asadi, Hussein
    Dey, Kushal K.
    Novembre, John
    Stephens, Matthew
    [J]. BIOINFORMATICS, 2019, 35 (08) : 1292 - 1298
  • [3] Blei DM, 2004, ADV NEUR IN, V16, P17
  • [4] Latent Dirichlet allocation
    Blei, DM
    Ng, AY
    Jordan, MI
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) : 993 - 1022
  • [5] Replication and refinement of a vaginal microbial signature of preterm birth in two racially distinct cohorts of US women
    Callahan, Benjamin J.
    DiGiulio, Daniel B.
    Goltsman, Daniela S. Aliaga
    Sun, Christine L.
    Costello, Elizabeth K.
    Jeganathan, Pratheepa
    Biggio, Joseph R.
    Wong, Ronald J.
    Druzin, Maurice L.
    Shaw, Gary M.
    Stevenson, David K.
    Holmes, Susan P.
    Relman, David A.
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2017, 114 (37) : 9966 - 9971
  • [6] Carbonetto P., 2021, ARXIV PREPRINT ARXIV
  • [7] Visualizing the structure of RNA-seq expression data using grade of membership models
    Dey, Kushal K.
    Hsiao, Chiaowen Joyce
    Stephens, Matthew
    [J]. PLOS GENETICS, 2017, 13 (03):
  • [8] Exploratory data analysis for complex models
    Gelman, A
    [J]. JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2004, 13 (04) : 755 - 779
  • [9] Philosophy and the practice of Bayesian statistics
    Gelman, Andrew
    Shalizi, Cosma Rohilla
    [J]. BRITISH JOURNAL OF MATHEMATICAL & STATISTICAL PSYCHOLOGY, 2013, 66 (01) : 8 - 38
  • [10] cisTopic: cis-regulatory topic modeling on single-cell ATAC-seq data
    Gonzalez-Blas, Carmen Bravo
    Minnoye, Liesbeth
    Papasokrati, Dafni
    Aibar, Sara
    Hulselmans, Gert
    Christiaens, Valerie
    Davie, Kristofer
    Wouters, Jasper
    Aerts, Stein
    [J]. NATURE METHODS, 2019, 16 (05) : 397 - +