Domain adaptation in small-scale and heterogeneous biological datasets

被引:2
作者
Orouji, Seyedmehdi [1 ]
Liu, Martin C. [2 ,3 ]
Korem, Tal [3 ,4 ,5 ]
Peters, Megan A. K. [1 ,5 ,6 ]
机构
[1] Univ Calif Irvine, Dept Cognit Sci, Irvine, CA 92697 USA
[2] Columbia Univ Irving Med Ctr, Dept Biomed Informat, New York, NY USA
[3] Columbia Univ Irving Med Ctr, Dept Syst Biol, Program Math Genom, New York, NY 10032 USA
[4] Columbia Univ Irving Med Ctr, Dept Obstet & Gynecol, New York, NY 10032 USA
[5] CIFAR, CIFAR Azrieli Global Scholars program, Toronto, ON, Canada
[6] CIFAR, Program Brain Mind & Consciousness, Toronto, ON, Canada
来源
SCIENCE ADVANCES | 2024年 / 10卷 / 51期
关键词
CLASSIFICATION; IDENTIFICATION; VISUALIZATION; INFERENCE; SOFTWARE; NETWORK; MIXTURE; IMPACT; KERNEL;
D O I
10.1126/sciadv.adp6040
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Machine-learning models are key to modern biology, yet models trained on one dataset are often not generalizable to other datasets from different cohorts or laboratories due to both technical and biological differences. Domain adaptation, a type of transfer learning, alleviates this problem by aligning different datasets so that models can be applied across them. However, most state-of-the-art domain adaptation methods were designed for large-scale data such as images, whereas biological datasets are smaller and have more features, and these are also complex and heterogeneous. This Review discusses domain adaptation methods in the context of such biological data to inform biologists and guide future domain adaptation research. We describe the benefits and challenges of domain adaptation in biological research and critically explore some of its objectives, strengths, and weaknesses. We argue for the incorporation of domain adaptation techniques to the computational biologist's toolkit, with further development of customized approaches.
引用
收藏
页数:14
相关论文
共 200 条
  • [1] Abdill RJ, 2023, bioRxiv, DOI [10.1101/2023.10.11.560955, 10.1101/2023.10.11.560955, DOI 10.1101/2023.10.11.560955]
  • [2] Ahmed M, 2019, IEEE INT C SEMANT CO, P224, DOI [10.1109/ICSC.2019.00050, 10.1109/ICOSC.2019.8665584]
  • [3] Ajakan H, 2015, Arxiv, DOI arXiv:1412.4446
  • [4] The curse(s) of dimensionality
    Altman, Naomi
    Krzywinski, Martin
    [J]. NATURE METHODS, 2018, 15 (06) : 399 - 400
  • [5] Achieving pan-microbiome biological insights via the dbBact knowledge base
    Amir, Amnon
    Ozel, Eitan
    Haberman, Yael
    Shental, Noam
    [J]. NUCLEIC ACIDS RESEARCH, 2023, 51 (13) : 6593 - 6608
  • [6] Deblur Rapidly Resolves Single-Nucleotide Community Sequence Patterns
    Amir, Amnon
    McDonald, Daniel
    Navas-Molina, Jose A.
    Kopylova, Evguenia
    Morton, James T.
    Xu, Zhenjiang Zech
    Kightley, Eric P.
    Thompson, Luke R.
    Hyde, Embriette R.
    Gonzalez, Antonio
    Knight, Rob
    [J]. MSYSTEMS, 2017, 2 (02)
  • [7] Impact of functional MRI data preprocessing pipeline on default-mode network detectability in patients with disorders of consciousness
    Andronache, Adrian
    Rosazza, Cristina
    Sattin, Davide
    Leonardi, Matilde
    D'Incerti, Ludovico
    Minati, Ludovico
    [J]. FRONTIERS IN NEUROINFORMATICS, 2013, 7
  • [8] Arik SO, 2021, AAAI CONF ARTIF INTE, V35, P6679
  • [9] Arpit D, 2017, PR MACH LEARN RES, V70
  • [10] Austin GI, 2024, bioRxiv, DOI [10.1101/2024.02.09.579716, 10.1101/2024.02.09.579716, DOI 10.1101/2024.02.09.579716]