Disentangled global and local features of multi-source data variational autoencoder: An interpretable model for diagnosing IgAN via multi-source Raman spectral fusion techniques

被引:0
|
作者
Shuai, Wei [1 ]
Tian, Xuecong [2 ]
Zuo, Enguang [2 ]
Zhang, Xueqin [3 ]
Lu, Chen [4 ]
Gu, Jin [5 ,6 ]
Chen, Chen [2 ]
Lv, Xiaoyi [1 ]
Chen, Cheng [1 ]
机构
[1] Xinjiang Univ, Coll Software, Urumqi 830046, Peoples R China
[2] Xinjiang Univ, Coll Informat Sci & Engn, Urumqi 830046, Peoples R China
[3] Peoples Hosp Xinjiang Uygur Autonomous Reg, Dept Nephrol, Urumqi 830001, Xinjiang, Peoples R China
[4] Xinjiang Med Univ, Affiliated Hosp 1, Dept Nephrol, Urumqi 830011, Xinjiang, Peoples R China
[5] Tsinghua Univ, Inst Precis Med, BNRIST Bioinformat Div, MOE,Key Lab Bioinformat, Beijing 100084, Peoples R China
[6] Tsinghua Univ, Dept Automat, Beijing 100084, Peoples R China
关键词
lgAN; Raman spectroscopy; Multi-source data fusion; Encoder decoupling; SHAP; SPECTROSCOPY; CLASSIFICATION; URINE;
D O I
10.1016/j.artmed.2024.103053
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A single Raman spectrum reflects limited molecular information. Effective fusion of the Raman spectra of serum and urine source domains helps to obtain richer feature information. However, most of the current studies on immunoglobulin A nephropathy (IgAN) based on Raman spectroscopy are based on small sample data and low signal-to-noise ratio. If a multi-source data fusion strategy is directly adopted, it may even reduce the accuracy of disease diagnosis. To this end, this paper proposes a data enhancement and spectral optimization method based on variational autoencoders to obtain reconstructed Raman spectra with doubled sample size and improved signal-to-noise ratio. In the diagnosis of IgAN in multi-source domain Raman spectra, this paper builds a global and local feature decoupled variational autoencoder (DMSGL-VAE) model based on multi-source data. First, the statistical features after spectral segmentation are extracted, and the latent variables obtained by the variational encoder are decoupled through the decoupling module. The global representation and local representation obtained represent the global shared information and local unique information of the serum and urine source domains, respectively. Then, the cross-source reconstruction loss and decoupling loss are used to constrain the decoupling, and the effectiveness of the decoupling is proved quantitatively and qualitatively. Finally, the features of different source domains were integrated to diagnose IgAN, and the results were analyzed for important features using the SHapley Additive exPlanations algorithm. The experimental results showed that the AUC value of the DMSGL-VAE model for diagnosing IgAN on the test set was as high as 0.9958. The SHAP algorithm was used to further prove that proteins, hydroxybutyrate, and guanine are likely to be common biological fingerprint substances for the diagnosis of IgAN by serum and urine Raman spectroscopy. In summary, the DMSGL-VAE model designed based on Raman spectroscopy in this paper can achieve rapid, non-invasive, and accurate screening of IgAN in terms of classification performance. And interpretable analysis may help doctors further understand IgAN and make more efficient diagnostic measures in the future.
引用
收藏
页数:18
相关论文
共 50 条
  • [1] Establishment of PWV Fusion Model Using Multi-source Data
    Zhao Q.
    Du Z.
    Wu M.
    Yao Y.
    Yao W.
    Wuhan Daxue Xuebao (Xinxi Kexue Ban)/Geomatics and Information Science of Wuhan University, 2022, 47 (11): : 1823 - 1831+1846
  • [2] Multi-source data fusion for economic data analysis
    Li, Menggang
    Wang, Fang
    Jia, Xiaojun
    Li, Wenrui
    Li, Ting
    Rui, Guangwei
    NEURAL COMPUTING & APPLICATIONS, 2021, 33 (10): : 4729 - 4739
  • [3] Multi-source data fusion for economic data analysis
    Menggang Li
    Fang Wang
    Xiaojun Jia
    Wenrui Li
    Ting Li
    Guangwei Rui
    Neural Computing and Applications, 2021, 33 : 4729 - 4739
  • [4] Multi-source attention autoencoder network for hyperspectral unmixing with LiDAR data
    Hu, Jiwei
    Bai, Yangrui
    Li, Zijun
    Jin, Qiwen
    Peng, Chengli
    NEUROCOMPUTING, 2025, 623
  • [5] Multi-Source Data Fusion Method for Indoor Localization System
    Cui, Jishi
    Li, Bin
    Yang, Lyuxiao
    Wu, Nan
    2020 IEEE/CIC INTERNATIONAL CONFERENCE ON COMMUNICATIONS IN CHINA (ICCC), 2020, : 29 - 33
  • [6] Traffic control approach based on multi-source data fusion
    Wang, Pu
    Wang, Chengcheng
    Lai, Jiyu
    Huang, Zhiren
    Ma, Jiangshan
    Mao, Yingping
    IET INTELLIGENT TRANSPORT SYSTEMS, 2019, 13 (05) : 764 - 772
  • [7] Simulation Credibility Evaluation Based on Multi-source Data Fusion
    Zhou, Yuchen
    Fang, Ke
    Ma, Ping
    Yang, Ming
    METHODS AND APPLICATIONS FOR MODELING AND SIMULATION OF COMPLEX SYSTEMS, 2018, 946 : 18 - 31
  • [8] Forest Types Classification Based on Multi-Source Data Fusion
    Lu, Ming
    Chen, Bin
    Liao, Xiaohan
    Yue, Tianxiang
    Yue, Huanyin
    Ren, Shengming
    Li, Xiaowen
    Nie, Zhen
    Xu, Bing
    REMOTE SENSING, 2017, 9 (11)
  • [9] Geotechnical investigation measures and techniques in construction engineering based on multi-source data fusion
    Pei H.
    Applied Mathematics and Nonlinear Sciences, 2024, 9 (01)
  • [10] A deep learning interpretable model for river dissolved oxygen multi-step and interval prediction based on multi-source data fusion
    Wang, Zhaocai
    Wang, Qingyu
    Liu, Zhixiang
    Wu, Tunhua
    JOURNAL OF HYDROLOGY, 2024, 629