Intelligent phenotype-detection and gene expression profile generation with generative adversarial networks

被引:2
作者
Ravaee, Hamid [1 ]
Manshaei, Mohammad Hossein [1 ]
Safayani, Mehran [1 ]
Sartakhti, Javad Salimi [2 ]
机构
[1] Isfahan Univ Technol, Dept Elect & Comp Engn, Esfahan 8415683111, Iran
[2] Univ Kashan, Dept Elect & Comp Engn, Kashan, Iran
关键词
Generative adversarial networks; Gene expression; Augmentation of RNA-seq data; Cancer diagnosis; Cancer phenotype detection;
D O I
10.1016/j.jtbi.2023.111636
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Gene expression analysis is valuable for cancer type classification and identifying diverse cancer phenotypes. The latest high-throughput RNA sequencing devices have enabled access to large volumes of gene expression data. However, we face several challenges, such as data security and privacy, when we develop machine learning-based classifiers for categorizing cancer types with these datasets. To address these issues, we propose IP3G (Intelligent Phenotype-detection and Gene expression profile Generation with Generative adversarial network), a model based on Generative Adversarial Networks. IP3G tackles two major problems: augmenting gene expression data and unsupervised phenotype discovery. By converting gene expression profiles into 2-Dimensional images and leveraging IP3G, we generate new profiles for specific phenotypes. IP3G learns disentangled representations of gene expression patterns and identifies phenotypes without labeled data. We improve the objective function of the GAN used in IP3G by employing the earth mover distance and a novel mutual information function. IP3G outperforms clustering methods like k-Means, DBSCAN, and GMM in unsupervised phenotype discovery, while also surpassing SVM and CNN classification accuracy by up to 6% through gene expression profile augmentation. The source code for the developed IP3G is accessible to the public on GitHub.
引用
收藏
页数:15
相关论文
共 46 条
[1]   The Genotype-Tissue Expression (GTEx) pilot analysis: Multitissue gene regulation in humans [J].
Ardlie, Kristin G. ;
DeLuca, David S. ;
Segre, Ayellet V. ;
Sullivan, Timothy J. ;
Young, Taylor R. ;
Gelfand, Ellen T. ;
Trowbridge, Casandra A. ;
Maller, Julian B. ;
Tukiainen, Taru ;
Lek, Monkol ;
Ward, Lucas D. ;
Kheradpour, Pouya ;
Iriarte, Benjamin ;
Meng, Yan ;
Palmer, Cameron D. ;
Esko, Tonu ;
Winckler, Wendy ;
Hirschhorn, Joel N. ;
Kellis, Manolis ;
MacArthur, Daniel G. ;
Getz, Gad ;
Shabalin, Andrey A. ;
Li, Gen ;
Zhou, Yi-Hui ;
Nobel, Andrew B. ;
Rusyn, Ivan ;
Wright, Fred A. ;
Lappalainen, Tuuli ;
Ferreira, Pedro G. ;
Ongen, Halit ;
Rivas, Manuel A. ;
Battle, Alexis ;
Mostafavi, Sara ;
Monlong, Jean ;
Sammeth, Michael ;
Mele, Marta ;
Reverter, Ferran ;
Goldmann, Jakob M. ;
Koller, Daphne ;
Guigo, Roderic ;
McCarthy, Mark I. ;
Dermitzakis, Emmanouil T. ;
Gamazon, Eric R. ;
Im, Hae Kyung ;
Konkashbaev, Anuar ;
Nicolae, Dan L. ;
Cox, Nancy J. ;
Flutre, Timothee ;
Wen, Xiaoquan ;
Stephens, Matthew .
SCIENCE, 2015, 348 (6235) :648-660
[2]  
Arjovsky M, 2017, PR MACH LEARN RES, V70
[3]   Representation of features as images with neighborhood dependencies for compatibility with convolutional neural networks [J].
Bazgir, Omid ;
Zhang, Ruibo ;
Dhruba, Saugato Rahman ;
Rahman, Raziur ;
Ghosh, Souparno ;
Pal, Ranadip .
NATURE COMMUNICATIONS, 2020, 11 (01)
[4]   Evaluation Metrics for Conditional Image Generation [J].
Benny, Yaniv ;
Galanti, Tomer ;
Benaim, Sagie ;
Wolf, Lior .
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2021, 129 (05) :1712-1731
[5]   Data augmentation for cancer classification in oncogenomics: an improved KNN based approach [J].
Chaudhari, Poonam ;
Agarwal, Himanshu ;
Bhateja, Vikrant .
EVOLUTIONARY INTELLIGENCE, 2021, 14 (02) :489-498
[6]   Data augmentation using MG-GAN for improved cancer classification on gene expression data [J].
Chaudhari, Poonam ;
Agrawal, Himanshu ;
Kotecha, Ketan .
SOFT COMPUTING, 2020, 24 (15) :11381-11391
[7]  
Chen X, 2016, 30 C NEURAL INFORM P, V29
[8]   Mapping complex disease traits with global gene expression [J].
Cookson, William ;
Liang, Liming ;
Abecasis, Goncalo ;
Moffatt, Miriam ;
Lathrop, Mark .
NATURE REVIEWS GENETICS, 2009, 10 (03) :184-194
[9]   Adversarial deconfounding autoencoder for learning robust gene expression embeddings [J].
Dincer, Ayse B. ;
Janizek, Joseph D. ;
Lee, Su-In .
BIOINFORMATICS, 2020, 36 :I573-I582
[10]   Interpretable dimensionality reduction of single cell transcriptome data with deep generative models [J].
Ding, Jiarui ;
Condon, Anne ;
Shah, Sohrab P. .
NATURE COMMUNICATIONS, 2018, 9