Chord: an ensemble machine learning algorithm to identify doublets in single-cell RNA sequencing data

被引:6
|
作者
Xiong, Ke-Xu [1 ,2 ]
Zhou, Han-Lin [2 ,3 ,4 ,5 ,6 ,7 ]
Lin, Cong [2 ,5 ,6 ,8 ]
Yin, Jian-Hua [2 ,5 ,6 ,8 ]
Kristiansen, Karsten [2 ,7 ]
Yang, Huan-Ming [2 ,9 ]
Li, Gui-Bo [2 ,3 ,4 ,5 ,6 ,8 ]
机构
[1] Univ Chinese Acad Sci, Coll Life Sci, Beijing 100049, Peoples R China
[2] BGI Shenzhen, Shenzhen 518083, Peoples R China
[3] Zhengzhou Univ, BGI Coll, Zhengzhou, Peoples R China
[4] Zhengzhou Univ, Henan Inst Med & Pharmaceut Sci, Zhengzhou, Peoples R China
[5] BGI Shenzhen, BGI Henan, Xinxiang 453000, Henan, Peoples R China
[6] BGI Shenzhen, Shenzhen Key Lab Genom, Guangdong Prov Key Lab Human Dis Genom, Shenzhen 518083, Peoples R China
[7] Univ Copenhagen, Dept Biol, Lab Genom & Mol Biomed, DK-2100 Copenhagen, Denmark
[8] BGI Shenzhen, Shenzhen Key Lab Single Cell Omics, Shenzhen 518083, Peoples R China
[9] James D Watson Inst Genome Sci, Hangzhou 310008, Peoples R China
关键词
SEQ;
D O I
10.1038/s42003-022-03476-9
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
For the unmet need to choose the suitable doublet detection method, an ensemble machine learning algorithm called Chord was developed, which integrates multiple methods and achieves higher accuracy and stability on different scRNA-seq datasets. High-throughput single-cell RNA sequencing (scRNA-seq) is a popular method, but it is accompanied by doublet rate problems that disturb the downstream analysis. Several computational approaches have been developed to detect doublets. However, most of these methods may yield satisfactory performance in some datasets but lack stability in others; thus, it is difficult to regard a single method as the gold standard which can be applied to all types of scenarios. It is a difficult and time-consuming task for researchers to choose the most appropriate software. We here propose Chord which implements a machine learning algorithm that integrates multiple doublet detection methods to address these issues. Chord had higher accuracy and stability than the individual approaches on different datasets containing real and synthetic data. Moreover, Chord was designed with a modular architecture port, which has high flexibility and adaptability to the incorporation of any new tools. Chord is a general solution to the doublet detection problem.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] Chord: an ensemble machine learning algorithm to identify doublets in single-cell RNA sequencing data
    Ke-Xu Xiong
    Han-Lin Zhou
    Cong Lin
    Jian-Hua Yin
    Karsten Kristiansen
    Huan-Ming Yang
    Gui-Bo Li
    Communications Biology, 5
  • [2] scds: computational annotation of doublets in single-cell RNA sequencing data
    Bais, Abha S.
    Kostka, Dennis
    BIOINFORMATICS, 2020, 36 (04) : 1150 - 1158
  • [3] Vaeda computationally annotates doublets in single-cell RNA sequencing data
    Schriever, Hannah
    Kostka, Dennis
    BIOINFORMATICS, 2023, 39 (01)
  • [4] DoubletDecon: Deconvoluting Doublets from Single-Cell RNA-Sequencing Data
    DePasquale, Erica A. K.
    Schnell, Daniel J.
    Van Camp, Pieter-Jan
    Valiente-Alandi, Inigo
    Blaxall, Burns C.
    Grimes, H. Leighton
    Singh, Harinder
    Salomonis, Nathan
    CELL REPORTS, 2019, 29 (06): : 1718 - +
  • [5] Analysis of transcriptome of single-cell RNA sequencing data using machine learning
    Rajesh, Mothe
    Martha, Sheshikala
    SOFT COMPUTING, 2023, 27 (13) : 9131 - 9141
  • [6] EnTSSR: A Weighted Ensemble Learning Method to Impute Single-Cell RNA Sequencing Data
    Lu, Fan
    Lin, Yilong
    Yuan, Chongbin
    Zhang, Xiao-Fei
    Le Ou-Yang
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2021, 18 (06) : 2781 - 2787
  • [7] Machine learning and statistical methods for clustering single-cell RNA-sequencing data
    Petegrosso, Raphael
    Li, Zhuliu
    Kuang, Rui
    BRIEFINGS IN BIOINFORMATICS, 2020, 21 (04) : 1209 - 1223
  • [8] Integrating single-cell RNA sequencing, WGCNA, and machine learning to identify key biomarkers in hepatocellular carcinoma
    Gang Wang
    Jiaxing Zhang
    Yirong Li
    Yuyu Zhang
    Weiwei Dong
    Hengquan Wu
    Jinglan Wang
    Peiqing Liao
    Ziqiang Yuan
    Tao Liu
    Wenting He
    Scientific Reports, 15 (1)
  • [9] doubletD: detecting doublets in single-cell DNA sequencing data
    Weber, Leah L.
    Sashittal, Palash
    El-Kebir, Mohammed
    BIOINFORMATICS, 2021, 37 : I214 - I221
  • [10] scDEA: differential expression analysis in single-cell RNA-sequencing data via ensemble learning
    Li, Hui-Sheng
    Le Ou-Yang
    Yuan Zhu
    Hong Yan
    Zhang, Xiao-Fei
    BRIEFINGS IN BIOINFORMATICS, 2022, 23 (01)