Massively parallel unsupervised single-particle cryo-EM data clustering via statistical manifold learning

被引:28
作者
Wu, Jiayi [1 ,2 ]
Ma, Yong-Bei [2 ]
Congdon, Charles [3 ]
Brett, Bevin [3 ]
Chen, Shuobing [1 ,2 ]
Xu, Yaofang [2 ,4 ]
Ouyang, Qi [1 ,5 ]
Mao, Youdong [1 ,2 ,6 ]
机构
[1] Peking Univ, Sch Phys, State Key Lab Artificial Microstruct & Mesoscop P, Inst Condensed Matter Phys,Ctr Quantitat Biol, Beijing, Peoples R China
[2] Dana Farber Canc Inst, Intel Parallel Comp Ctr Struct Biol, Boston, MA 02115 USA
[3] Intel Corp, Software & Serv Grp, Santa Clara, CA USA
[4] Peking Univ, Hlth Sci Ctr, Dept Biophys, Beijing, Peoples R China
[5] Peking Univ, Peking Tsinghua Joint Ctr Life Sci, Beijing, Peoples R China
[6] Harvard Med Sch, Dept Microbiol & Immunobiol, Boston, MA USA
来源
PLOS ONE | 2017年 / 12卷 / 08期
基金
美国国家科学基金会; 中国国家自然科学基金;
关键词
NONLINEAR DIMENSIONALITY REDUCTION; MICROSCOPY; CLASSIFICATION; PROJECTION; MACROMOLECULES; IMAGES; SPARX; SUITE; XMIPP;
D O I
10.1371/journal.pone.0182130
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Structural heterogeneity in single-particle cryo-electron microscopy (cryo-EM) data represents a major challenge for high-resolution structure determination. Unsupervised classification may serve as the first step in the assessment of structural heterogeneity. However, traditional algorithms for unsupervised classification, such as K-means clustering and maximum likelihood optimization, may classify images into wrong classes with decreasing signal-to-noise-ratio (SNR) in the image data, yet demand increased computational costs. Overcoming these limitations requires further development of clustering algorithms for high-performance cryo-EM data processing. Here we introduce an unsupervised single-particle clustering algorithm derived from a statistical manifold learning framework called generative topographic mapping (GTM). We show that unsupervised GTM clustering improves classification accuracy by about 40% in the absence of input references for data with lower SNRs. Applications to several experimental datasets suggest that our algorithm can detect subtle structural differences among classes via a hierarchical clustering strategy. After code optimization over a high-performance computing (HPC) environment, our software implementation was able to generate thousands of reference-free class averages within hours in a massively parallel fashion, which allows a significant improvement on ab initio 3D reconstruction and assists in the computational purification of homogeneous datasets for high-resolution visualization.
引用
收藏
页数:25
相关论文
共 50 条
  • [1] [Anonymous], ADV SELF ORG MAPS
  • [2] The Transform Class in SPARX and EMAN2
    Baldwin, P. R.
    Penczek, Pawel A.
    [J]. JOURNAL OF STRUCTURAL BIOLOGY, 2007, 157 (01) : 250 - 261
  • [3] Developments of the generative topographic mapping
    Bishop, CM
    Svensén, M
    Williams, CKI
    [J]. NEUROCOMPUTING, 1998, 21 (1-3) : 203 - 224
  • [4] GTM: The generative topographic mapping
    Bishop, CM
    Svensen, M
    Williams, CKI
    [J]. NEURAL COMPUTATION, 1998, 10 (01) : 215 - 234
  • [5] Trajectories of the ribosome as a Brownian nanomachine
    Dashti, Ali
    Schwander, Peter
    Langlois, Robert
    Fung, Russell
    Li, Wen
    Hosseinizadeh, Ahmad
    Liao, Hstau Y.
    Pallesen, Jesper
    Sharma, Gyanesh
    Stupina, Vera A.
    Simon, Anne E.
    Dinman, Jonathan D.
    Frank, Joachim
    Ourmazd, Abbas
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2014, 111 (49) : 17492 - 17497
  • [6] Xmipp 3.0: An improved software suite for image processing in electron microscopy
    de la Rosa-Trevin, J. M.
    Oton, J.
    Marabini, R.
    Zaldivar, A.
    Vargas, J.
    Carazo, J. M.
    Sorzano, C. O. S.
    [J]. JOURNAL OF STRUCTURAL BIOLOGY, 2013, 184 (02) : 321 - 328
  • [7] Frank J., 2006, Three-dimensional electron microscopy of macromolecular assemblies: visualization of biological molecules in their native state
  • [8] SPARX, a new environment for Cryo-EM image processing
    Hohn, Michael
    Tang, Grant
    Goodyear, Grant
    Baldwin, P. R.
    Huang, Zhong
    Penczek, Pawel A.
    Yang, Chao
    Glaeser, Robert M.
    Adams, Paul D.
    Ludtke, Steven J.
    [J]. JOURNAL OF STRUCTURAL BIOLOGY, 2007, 157 (01) : 47 - 55
  • [9] Jeffers J., 2013, Intel Xeon Phi coprocessor high-performance programming
  • [10] Covariance Matrix Estimation for the Cryo-EM Heterogeneity Problem
    Katsevich, E.
    Katsevich, A.
    Singer, A.
    [J]. SIAM JOURNAL ON IMAGING SCIENCES, 2015, 8 (01): : 126 - 185