Data leakage jeopardizes ecological applications of machine learning

被引:9
|
作者
Stock, Andy [1 ]
Gregr, Edward J. [1 ,2 ]
Chan, Kai M. A. [1 ]
机构
[1] Univ British Columbia, Inst Resources Environm & Sustainabil, Vancouver, BC, Canada
[2] SciTech Environm Consulting, Vancouver, BC, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
VALIDATION;
D O I
10.1038/s41559-023-02162-1
中图分类号
Q14 [生态学(生物生态学)];
学科分类号
071012 ; 0713 ;
摘要
Machine learning is a popular tool in ecology but many scientific applications suffer from data leakage, causing misleading results. We highlight common pitfalls in ecological machine-learning methods and argue that discipline-specific model info sheets must be developed to aid in model evaluations.
引用
收藏
页码:1743 / 1745
页数:3
相关论文
共 50 条
  • [31] Fundamentals and Applications Related to Data Science, Machine Learning, and Statistical Processing V: Applications of Machine Learning at Kanadevia Corporation
    Umano, Motohide
    Miyake, Toshihide
    Ioka, Ryota
    Wada, Takahiro
    Zairyo/Journal of the Society of Materials Science, Japan, 2024, 73 (11) : 881 - 887
  • [32] Bounding information leakage in machine learning
    Del Grosso, Ganesh
    Pichler, Georg
    Palamidessi, Catuscia
    Piantanida, Pablo
    NEUROCOMPUTING, 2023, 534 : 1 - 17
  • [33] Machine learning ecological networks
    O'Gorman, Eoin J.
    SCIENCE, 2022, 377 (6609) : 918 - 919
  • [34] Machine learning for ecological analysis
    Yu, Zhengyang
    Bu, Chunfeng
    Li, Yanjie
    CHEMICAL ENGINEERING JOURNAL, 2025, 507
  • [35] Learning from data: Applications of Machine Learning in optical network design and modeling
    Alberto Hernandez, Jose
    2020 INTERNATIONAL CONFERENCE ON OPTICAL NETWORK DESIGN AND MODELING (ONDM), 2020,
  • [36] Anomaly analytics in data-driven machine learning applications
    Azimi, Shelernaz
    Pahl, Claus
    INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS, 2025, 19 (01) : 155 - 180
  • [37] Possibilistic Similarity Measures for Data Science and Machine Learning Applications
    Charfi, Amal
    Bouhamed, Sonda Ammar
    Bosse, Eloi
    Kallel, Imene Khanfir
    Bouchaala, Wassim
    Solaiman, Basel
    Derbel, Nabil
    IEEE ACCESS, 2020, 8 : 49198 - 49211
  • [38] Extreme Learning Machine on High Dimensional and Large Data Applications
    Lin, Zhiping
    Cao, Jiuwen
    Chen, Tao
    Jin, Yi
    Sun, Zhan-Li
    Lendasse, Amaury
    MATHEMATICAL PROBLEMS IN ENGINEERING, 2015, 2015
  • [39] Data Sensitivity and Domain Specificity in Reuse of Machine Learning Applications
    Rutschi, Corinna
    Berente, Nicholas
    Nwanganga, Frederick
    INFORMATION SYSTEMS FRONTIERS, 2024, 26 (02) : 633 - 640
  • [40] What Counts as "Clinical Data" in Machine Learning Healthcare Applications?
    Skorburg, Joshua August
    AMERICAN JOURNAL OF BIOETHICS, 2020, 20 (11): : 27 - 30