How data heterogeneity affects innovating knowledge and information in gene identification: A statistical learning perspective

被引:0
|
作者
Zhao, Jun [1 ]
Lao, Fangyi [1 ]
Yan, Guan'ao [2 ]
Zhang, Yi [3 ]
机构
[1] Hangzhou City Univ, Dept Stat & Data Sci, Hangzhou, Peoples R China
[2] Univ Calif Los Angeles, Dept Stat, Los Angeles, CA USA
[3] Zhejiang Univ, Sch Math Sci, Hangzhou, Peoples R China
来源
JOURNAL OF INNOVATION & KNOWLEDGE | 2024年 / 9卷 / 03期
关键词
Data heterogeneity; Gene identification; Statistical learning; Semiparametric modelling; NONCONCAVE PENALIZED LIKELIHOOD; QUANTILE REGRESSION; SELECTION; MODEL;
D O I
10.1016/j.jik.2024.100514
中图分类号
F [经济];
学科分类号
02 ;
摘要
Data heterogeneity, particularly noted in fields such as genetics, has been identified as a key feature of big data, posing significant challenges to innovation in knowledge and information. This paper focuses on characterizing and understanding the so-called "curse of heterogeneity" in gene identification for low infant birth weight from a statistical learning perspective. Owing to the computational and analytical advantages of expectile regression in handling heterogeneity, this paper proposes a flexible, regularized, partially linear additive expectile regression model for high-dimensional heterogeneous data. Unlike most existing works that assume Gaussian or sub-Gaussian error distributions, we adopt a more realistic, less stringent assumption that the errors have only finite moments. Additionally, we derive a two-step algorithm to address the reduced optimization problem and demonstrate that our method, with a probability approaching one, achieves optimal estimation accuracy. Furthermore, we demonstrate that the proposed algorithm converges at least linearly, ensuring the practical applicability of our method. Monte Carlo simulations reveal that our method's resulting estimator performs well in terms of estimation accuracy, model selection, and heterogeneity identification. Empirical analysis in gene trait expression further underscores the potential for guiding public health interventions. (c) 2024 The Authors. Published by Elsevier Espa & ntilde;a, S.L.U. on behalf of Journal of Innovation & Knowledge. This (http://creativecommons.org/licenses/by-nc-nd/4.0/)
引用
收藏
页数:10
相关论文
共 27 条
  • [1] A Bayesian perspective of statistical machine learning for big data
    Sambasivan, Rajiv
    Das, Sourish
    Sahu, Sujit K.
    COMPUTATIONAL STATISTICS, 2020, 35 (03) : 893 - 930
  • [2] Innovating knowledge and information for a firm-level automobile demand forecast system: A machine learning perspective
    Kim, Sehoon
    JOURNAL OF INNOVATION & KNOWLEDGE, 2023, 8 (02):
  • [3] A statistical learning perspective on switched linear system identification
    Massucci, Louis
    Lauer, Fabien
    Gilson, Marion
    AUTOMATICA, 2022, 145
  • [4] How uncertainty affects information search among consumers: a curvilinear perspective
    He, Sharlene
    Rucker, Derek D.
    MARKETING LETTERS, 2023, 34 (03) : 415 - 428
  • [5] A Bayesian perspective of statistical machine learning for big data
    Rajiv Sambasivan
    Sourish Das
    Sujit K. Sahu
    Computational Statistics, 2020, 35 : 893 - 930
  • [6] The choice of reference gene affects statistical efficiency in quantitative PCR data analysis
    Guo, Yi
    Pennell, Michael L.
    Pearl, Dennis K.
    Knobloch, Thomas J.
    Fernandez, Soledad
    Weghorst, Christopher M.
    BIOTECHNIQUES, 2013, 55 (04) : 207 - 209
  • [7] An embedded method for gene identification problems involving unwanted data heterogeneity
    Meng Lu
    Human Genomics, 13
  • [8] An embedded method for gene identification in heterogenous data involving unwanted heterogeneity
    Lu, Meng
    PROCEEDINGS 2018 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2018, : 242 - 247
  • [9] An embedded method for gene identification problems involving unwanted data heterogeneity
    Lu, Meng
    HUMAN GENOMICS, 2019, 13 (Suppl 1) : 45
  • [10] How prior knowledge affects selective attention during category learning: An eyetracking study
    Kim, ShinWoo
    Rehder, Bob
    MEMORY & COGNITION, 2011, 39 (04) : 649 - 665