Machines Learn Better with Better Data Ontology: Lessons from Philosophy of Induction and Machine Learning Practice

被引:3
|
作者
Li, Dan [1 ]
机构
[1] CUNY, Baruch Coll, Philosophy Dept, New York, NY 10031 USA
关键词
Induction; Machine learning; Data ontology; No Free Lunch theorem; Goodman's riddle of induction; CLIMATE; MODELS;
D O I
10.1007/s11023-023-09639-9
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
As scientists start to adopt machine learning (ML) as one research tool, the security of ML and the knowledge generated become a concern. In this paper, I explain how supervised ML can be improved with better data ontology, or the way we make categories and turn information into data. More specifically, we should design data ontology in such a way that is consistent with the knowledge that we have about the target phenomenon so that such ontology can help us make the inductive leap. I do so by thinking through a thought experiment, Goodman's New Riddle of Induction (Fact, fiction, and forecast, Harvard University Press, 1955). Goodman's riddle helps flesh out three problems of induction: (1) the problem of equal goodies, that there are often too many equally good inductive results given the same data; (2) the problem of diverging performance, that these equally good results can give opposite predictions in the future; and (3) the problem of mediocrity, that when averaged across all equally possible datasets and tasks, no inductive algorithm outperforms any other. I show that all these three problems are manifested as real obstacles in ML practice, namely, the Rashomon effect (Breiman in Stat Sci 16(3):199-231, 2001), the problem of underspecification (D'Amour et al. in J Mach Learn Res, 2020, https://doi.org/10.48550/arXiv.2011.03395), and the No Free Lunch theorem (Wolpert in Neural Comput 8(7):1341-90, 1996, https://doi.org/10.1162/neco.1996.8.7. 1341). Lastly, I argue that proper data ontology can help mitigate these problems and I demonstrate how using concrete examples from climate science. This research highlights the links between philosophers' discussions of induction and implications in ML practice.
引用
收藏
页码:429 / 450
页数:22
相关论文
共 35 条
  • [21] Public Budget Simulations with Machine Learning and Synthetic Data: Some Challenges and Lessons from the Mexican Case
    Valle-Cruz, David
    Fernandez-Cortez, Vanessa
    Lopez-Chau, Asdrubal
    Rojas-Hernandez, Rafael
    ELECTRONIC GOVERNANCE WITH EMERGING TECHNOLOGIES, EGETC 2022, 2022, 1666 : 141 - 160
  • [22] Applying Machine-Learning Methods to Laser Acceleration of Protons: Lessons Learned From Synthetic Data
    Desai, Ronak
    Zhang, Thomas
    Felice, John J.
    Oropeza, Ricky
    Smith, Joseph R.
    Kryshchenko, Alona
    Orban, Chris
    Dexter, Michael L.
    Patnaik, Anil K.
    CONTRIBUTIONS TO PLASMA PHYSICS, 2025, 65 (03)
  • [23] The value of data, machine learning, and deep learning in restaurant demand forecasting: Insights and lessons learned from a large restaurant chain
    Chae, Bongsug
    Sheu, Chwen
    Park, Eunhye Olivia
    DECISION SUPPORT SYSTEMS, 2024, 184
  • [24] Exploring the impact of safety culture on incident reporting: Lessons learned from machine learning analysis of NHS England staff survey and incident data
    Kaya, G. K.
    Ustebay, S.
    Nixon, J.
    Pilbeam, C.
    Sujan, M.
    SAFETY SCIENCE, 2023, 166
  • [25] Is Machine Learning a Better Way to Identify COVID-19 Patients Who Might Benefit from Hydroxychloroquine Treatment?-The IDENTIFY Trial
    Burdick, Hoyt
    Lam, Carson
    Mataraso, Samson
    Siefkas, Anna
    Braden, Gregory
    Dellinger, R. Phillip
    McCoy, Andrea
    Vincent, Jean-Louis
    Green-Saxena, Abigail
    Barnes, Gina
    Hoffman, Jana
    Calvert, Jacob
    Pellegrini, Emily
    Das, Ritankar
    JOURNAL OF CLINICAL MEDICINE, 2020, 9 (12) : 1 - 18
  • [26] The Contribution of Artificial Intelligence in Achieving the Sustainable Development Goals (SDGs): What Can Eye Health Can Learn From Commercial Industry and Early Lessons From the Application of Machine Learning in Eye Health Programmes
    Sawers, Nicholas
    Bolster, Nigel
    Bastawrous, Andrew
    FRONTIERS IN PUBLIC HEALTH, 2021, 9
  • [27] What Machine Learning Can Learn from Foresight: A Human-Centered Approach For machine learning-based forecast efforts to succeed, they must embrace lessons from corporate foresight to address human and organizational challenges.
    Crews, Christian
    RESEARCH-TECHNOLOGY MANAGEMENT, 2019, 62 (01) : 30 - 33
  • [28] Smaller is better? Unduly nice accuracy assessments in roof detection using remote sensing data with machine learning and k-fold cross-validation
    Abriha, David
    Srivastava, Prashant K.
    Szabo, Szilard
    HELIYON, 2023, 9 (03)
  • [29] Towards Better Receptor-Ligand Prioritization: How Machine Learning on Protein-Protein Interaction Data Can Provide Insight Into Receptor-Ligand Pairs
    Iacucci, Ernesto
    Moreau, Yves
    ARTIFICIAL NEURAL NETWORKS-ICANN 2010, PT I, 2010, 6352 : 267 - 271
  • [30] Free interchange for better transit? Assessing the multi-dimensional impacts on metro to bus interchange behavior - insights from an explainable machine learning method
    Gu, Tianqi
    Zhang, Kaihan
    Xu, Weiping
    Zhuang, Chutian
    Jiang, Zhonghui
    Kim, Inhi
    Chung, Hyungchul
    TRAVEL BEHAVIOUR AND SOCIETY, 2025, 38