Disentangled global and local features of multi-source data variational autoencoder: An interpretable model for diagnosing IgAN via multi-source Raman spectral fusion techniques

被引:0
|
作者
Shuai, Wei [1 ]
Tian, Xuecong [2 ]
Zuo, Enguang [2 ]
Zhang, Xueqin [3 ]
Lu, Chen [4 ]
Gu, Jin [5 ,6 ]
Chen, Chen [2 ]
Lv, Xiaoyi [1 ]
Chen, Cheng [1 ]
机构
[1] Xinjiang Univ, Coll Software, Urumqi 830046, Peoples R China
[2] Xinjiang Univ, Coll Informat Sci & Engn, Urumqi 830046, Peoples R China
[3] Peoples Hosp Xinjiang Uygur Autonomous Reg, Dept Nephrol, Urumqi 830001, Xinjiang, Peoples R China
[4] Xinjiang Med Univ, Affiliated Hosp 1, Dept Nephrol, Urumqi 830011, Xinjiang, Peoples R China
[5] Tsinghua Univ, Inst Precis Med, BNRIST Bioinformat Div, MOE,Key Lab Bioinformat, Beijing 100084, Peoples R China
[6] Tsinghua Univ, Dept Automat, Beijing 100084, Peoples R China
关键词
lgAN; Raman spectroscopy; Multi-source data fusion; Encoder decoupling; SHAP; SPECTROSCOPY; CLASSIFICATION; URINE;
D O I
10.1016/j.artmed.2024.103053
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A single Raman spectrum reflects limited molecular information. Effective fusion of the Raman spectra of serum and urine source domains helps to obtain richer feature information. However, most of the current studies on immunoglobulin A nephropathy (IgAN) based on Raman spectroscopy are based on small sample data and low signal-to-noise ratio. If a multi-source data fusion strategy is directly adopted, it may even reduce the accuracy of disease diagnosis. To this end, this paper proposes a data enhancement and spectral optimization method based on variational autoencoders to obtain reconstructed Raman spectra with doubled sample size and improved signal-to-noise ratio. In the diagnosis of IgAN in multi-source domain Raman spectra, this paper builds a global and local feature decoupled variational autoencoder (DMSGL-VAE) model based on multi-source data. First, the statistical features after spectral segmentation are extracted, and the latent variables obtained by the variational encoder are decoupled through the decoupling module. The global representation and local representation obtained represent the global shared information and local unique information of the serum and urine source domains, respectively. Then, the cross-source reconstruction loss and decoupling loss are used to constrain the decoupling, and the effectiveness of the decoupling is proved quantitatively and qualitatively. Finally, the features of different source domains were integrated to diagnose IgAN, and the results were analyzed for important features using the SHapley Additive exPlanations algorithm. The experimental results showed that the AUC value of the DMSGL-VAE model for diagnosing IgAN on the test set was as high as 0.9958. The SHAP algorithm was used to further prove that proteins, hydroxybutyrate, and guanine are likely to be common biological fingerprint substances for the diagnosis of IgAN by serum and urine Raman spectroscopy. In summary, the DMSGL-VAE model designed based on Raman spectroscopy in this paper can achieve rapid, non-invasive, and accurate screening of IgAN in terms of classification performance. And interpretable analysis may help doctors further understand IgAN and make more efficient diagnostic measures in the future.
引用
收藏
页数:18
相关论文
共 50 条
  • [31] Resident Travel Characteristics Analysis Method Based on Multi-source Data Fusion
    Su Y.-J.
    Wen H.-Y.
    Wei Q.-B.
    Wu D.-X.
    Jiaotong Yunshu Xitong Gongcheng Yu Xinxi/Journal of Transportation Systems Engineering and Information Technology, 2020, 20 (05): : 56 - 63
  • [32] Research and Application of Intelligent Distribution Network Planning for Multi-source Data Fusion
    Wang, Zhe
    Zhao, Hongda
    Zhu, Mingxia
    2021 17TH INTERNATIONAL CONFERENCE ON MOBILITY, SENSING AND NETWORKING (MSN 2021), 2021, : 837 - 842
  • [33] Image Processing on Geological Data in Vector Format and Multi-Source Spatial Data Fusion
    Liu Xing Hu Guangdao Qiu Yubao Faculty of Earth Resources
    Journal of China University of Geosciences, 2003, (03) : 90 - 94
  • [34] Efficient Scarab Identification via Multi-source Data Fusion in Mask-RCNN with Attention Mechanism
    Yang, Zijia
    Wang, Lina
    Wen, Long
    Yuan, Junchao
    Deng, Jiangtao
    Fang, Kai
    Feng, Hailin
    2024 IEEE INTERNATIONAL CONFERENCE ON SMART INTERNET OF THINGS, SMARTIOT 2024, 2024, : 145 - 150
  • [35] City-scale industrial tank detection using multi-source spatial data fusion
    Wang, Zhibao
    Zhu, Mingyuan
    Bai, Lu
    Tao, Jinhua
    Wang, Mei
    He, Xiaoqing
    Jurek-Loughrey, Anna
    Chen, Liangfu
    INTERNATIONAL JOURNAL OF DIGITAL EARTH, 2024, 17 (01)
  • [36] A practical prediction method for grinding accuracy based on multi-source data fusion in manufacturing
    Haipeng Wu
    Zhihang Li
    Qian Tang
    Penghui Zhang
    Dong Xia
    Lianchang Zhao
    The International Journal of Advanced Manufacturing Technology, 2023, 127 : 1407 - 1417
  • [37] Artificial Intelligent Power Forecasting for Wind Farm Based on Multi-Source Data Fusion
    Wang, Qingtian
    Wang, Yunjing
    Zhang, Kegong
    Liu, Yaxin
    Qiang, Weiwei
    Han Wen, Qiuzi
    PROCESSES, 2023, 11 (05)
  • [38] Compound Positioning Method for Connected Electric Vehicles Based on Multi-Source Data Fusion
    Wang, Lin
    Li, Zhenhua
    Fan, Qinglan
    SUSTAINABILITY, 2022, 14 (14)
  • [39] Weld penetration state identification based on time series multi-source data fusion
    Wang, Fei
    Chen, Yourong
    Wang, Qiyue
    Liu, Liyuan
    Alam, Muhammad
    Zhang, Xudong
    Jiao, Wenhua
    WELDING IN THE WORLD, 2024, : 1401 - 1418
  • [40] Assessment of Electrical equipment status in distribution network based on multi-source data fusion
    Chen, Shaonan
    Liang, Shuo
    Li, Shan
    Zhou, Yangjun
    Yu, Xiaoyong
    2020 ASIA CONFERENCE ON GEOLOGICAL RESEARCH AND ENVIRONMENTAL TECHNOLOGY, 2021, 632