A Novel Machine-Learning Approach to Predict Stress-Responsive Genes in Arabidopsis

被引:2
|
作者
Nazari, Leyla [1 ]
Ghotbi, Vida [2 ]
Nadimi, Mohammad [3 ]
Paliwal, Jitendra [3 ]
机构
[1] Agr Res Educ & Extens Org AREEO, Fars Agr & Nat Resources Res & Educ Ctr, Crop & Hort Sci Res Dept, Shiraz 7155863511, Iran
[2] Agr Res Educ & Extens Org AREEO, Seed & Plant Improvement Inst, Karaj 3135933151, Iran
[3] Univ Manitoba, Dept Biosyst Engn, Winnipeg, MB R3T 5V6, Canada
关键词
LASSO; information gain; ReliefF; classifiers; random forest; SELECTION; TRANSCRIPTOMICS; EXPRESSION; TIME;
D O I
10.3390/a16090407
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This study proposes a hybrid gene selection method to identify and predict key genes in Arabidopsis associated with various stresses (including salt, heat, cold, high-light, and flagellin), aiming to enhance crop tolerance. An open-source microarray dataset (GSE41935) comprising 207 samples and 30,380 genes was analyzed using several machine learning tools including the synthetic minority oversampling technique (SMOTE), information gain (IG), ReliefF, and least absolute shrinkage and selection operator (LASSO), along with various classifiers (BayesNet, logistic, multilayer perceptron, sequential minimal optimization (SMO), and random forest). We identified 439 differentially expressed genes (DEGs), of which only three were down-regulated (AT3G20810, AT1G31680, and AT1G30250). The performance of the top 20 genes selected by IG and ReliefF was evaluated using the classifiers mentioned above to classify stressed versus non-stressed samples. The random forest algorithm outperformed other algorithms with an accuracy of 97.91% and 98.51% for IG and ReliefF, respectively. Additionally, 42 genes were identified from all 30,380 genes using LASSO regression. The top 20 genes for each feature selection were analyzed to determine three common genes (AT5G44050, AT2G47180, and AT1G70700), which formed a three-gene signature. The efficiency of these three genes was evaluated using random forest and XGBoost algorithms. Further validation was performed using an independent RNA_seq dataset and random forest. These gene signatures can be exploited in plant breeding to improve stress tolerance in a variety of crops.
引用
收藏
页数:14
相关论文
共 50 条
  • [31] Comparative proteomic analysis of NaCl stress-responsive proteins in Arabidopsis roots
    Jiang, Yuanqing
    Yang, Bo
    Harris, Neil S.
    Deyholos, Michael K.
    JOURNAL OF EXPERIMENTAL BOTANY, 2007, 58 (13) : 3591 - 3607
  • [32] Cis-regulatory code of stress-responsive transcription in Arabidopsis thaliana
    Zou, Cheng
    Sun, Kelian
    Mackaluso, Joshua D.
    Seddon, Alexander E.
    Jin, Rong
    Thomashow, Michael F.
    Shiu, Shin-Han
    PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2011, 108 (36) : 14992 - 14997
  • [33] The Translational Machine: A novel machine-learning approach to illuminate complex genetic architectures
    Askland, Kathleen D.
    Strong, David
    Wright, Marvin N.
    Moore, Jason H.
    GENETIC EPIDEMIOLOGY, 2021, 45 (05) : 485 - 536
  • [34] Proteomic and phosphoproteomic analyses of NaCl stress-responsive proteins in Arabidopsis roots
    Guo, Meili
    Gao, Weixi
    Li, Lei
    Li, Hua
    Xu, Yulan
    Zhou, Chunxi
    JOURNAL OF PLANT INTERACTIONS, 2014, 9 (01) : 396 - 401
  • [35] Stress-responsive expression of genes for two-component response regulator-like proteins in Arabidopsis thaliana
    Urao, T
    Yakubov, B
    Yamaguchi-Shinozaki, K
    Shinozaki, K
    FEBS LETTERS, 1998, 427 (02) : 175 - 178
  • [36] EXPRESSION OF STRESS-RESPONSIVE UBIQUITIN GENES IN POTATO-TUBERS
    GARBARINO, JE
    ROCKHOLD, DR
    BELKNAP, WR
    PLANT MOLECULAR BIOLOGY, 1992, 20 (02) : 235 - 244
  • [37] Local potentiation of stress-responsive genes by upstream noncoding transcription
    Takemata, Naomichi
    Oda, Arisa
    Yamada, Takatomi
    Galipon, Josephine
    Miyoshi, Tomoichiro
    Suzuki, Yutaka
    Sugano, Sumio
    Hoffman, Charles S.
    Hirota, Kouji
    Ohta, Kunihiro
    NUCLEIC ACIDS RESEARCH, 2016, 44 (11) : 5174 - 5189
  • [38] Transcriptomic Analysis of Cold Stress-responsive Genes in Brassica oleracea
    Thamilarasan, Senthil Kumar
    Park, Jong-In
    Jung, Hee-Jeong
    Ahmed, Nasar Uddin
    Nou, Ill-Sup
    HORTSCIENCE, 2014, 49 (09) : S364 - S364
  • [39] Combined use of feature engineering and machine-learning to predict essential genes in Drosophila melanogaster
    Campos, Tulio L.
    Korhonen, Pasi K.
    Hofmann, Andreas
    Gasser, Robin B.
    Young, Neil D.
    NAR GENOMICS AND BIOINFORMATICS, 2020, 2 (03)
  • [40] Identifying Genes to Predict Cancer Radiotherapy-Related Fatigue with Machine-Learning Methods
    Du, Wei
    Dickinson, Kristin
    Johnson, Calvin A.
    Saligan, Leorey N.
    ACM-BCB'18: PROCEEDINGS OF THE 2018 ACM INTERNATIONAL CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY, AND HEALTH INFORMATICS, 2018, : 527 - 527