Machines Learn Better with Better Data Ontology: Lessons from Philosophy of Induction and Machine Learning Practice

被引：3

作者：

Li, Dan ^{[1
]}

机构：

[1] CUNY, Baruch Coll, Philosophy Dept, New York, NY 10031 USA

来源：

MINDS AND MACHINES | 2023年 / 33卷 / 03期

关键词：

Induction; Machine learning; Data ontology; No Free Lunch theorem; Goodman's riddle of induction; CLIMATE; MODELS;

D O I：

10.1007/s11023-023-09639-9

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

As scientists start to adopt machine learning (ML) as one research tool, the security of ML and the knowledge generated become a concern. In this paper, I explain how supervised ML can be improved with better data ontology, or the way we make categories and turn information into data. More specifically, we should design data ontology in such a way that is consistent with the knowledge that we have about the target phenomenon so that such ontology can help us make the inductive leap. I do so by thinking through a thought experiment, Goodman's New Riddle of Induction (Fact, fiction, and forecast, Harvard University Press, 1955). Goodman's riddle helps flesh out three problems of induction: (1) the problem of equal goodies, that there are often too many equally good inductive results given the same data; (2) the problem of diverging performance, that these equally good results can give opposite predictions in the future; and (3) the problem of mediocrity, that when averaged across all equally possible datasets and tasks, no inductive algorithm outperforms any other. I show that all these three problems are manifested as real obstacles in ML practice, namely, the Rashomon effect (Breiman in Stat Sci 16(3):199-231, 2001), the problem of underspecification (D'Amour et al. in J Mach Learn Res, 2020, https://doi.org/10.48550/arXiv.2011.03395), and the No Free Lunch theorem (Wolpert in Neural Comput 8(7):1341-90, 1996, https://doi.org/10.1162/neco.1996.8.7. 1341). Lastly, I argue that proper data ontology can help mitigate these problems and I demonstrate how using concrete examples from climate science. This research highlights the links between philosophers' discussions of induction and implications in ML practice.

引用

页码：429 / 450

页数：22

共 35 条

[1] Machines Learn Better with Better Data Ontology: Lessons from Philosophy of Induction and Machine Learning Practice
Dan Li
Minds and Machines, 2023, 33 : 429 - 450
[2] Cognition-Enhanced Machine Learning for Better Predictions with Limited Data
Sense, Florian
Wood, Ryan
Collins, Michael G.
Fiechter, Joshua
Wood, Aihua
Krusmark, Michael
Jastrzembski, Tiffany
Myers, Christopher W.
TOPICS IN COGNITIVE SCIENCE, 2022, 14 (04) : 739 - 755
[3] Can machine learning on economic data better forecast the unemployment rate?
Kreiner, Aaron
Duca, John V.
APPLIED ECONOMICS LETTERS, 2020, 27 (17) : 1434 - 1437
[4] Better, Not Just More: Data-centric machine learning for Earth observation
Roscher, Ribana
Russwurm, Marc
Gevaert, Caroline
Kampffmeyer, Michael
Dos Santos, Jefersson A.
Vakalopoulou, Maria
Haensch, Ronny
Hansen, Stine
Nogueira, Keiller
Prexl, Jonathan
Tuia, Devis
IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE, 2024, 12 (04) : 335 - 355
[5] Machine learning in neurology: what neurologists can learn from machines and vice versa
Rose Bruffaerts
Journal of Neurology, 2018, 265 : 2745 - 2748
[6] Machine learning in neurology: what neurologists can learn from machines and vice versa
Bruffaerts, Rose
JOURNAL OF NEUROLOGY, 2018, 265 (11) : 2745 - 2748
[7] Machine Learning for Automatic Encoding of French Electronic Medical Records: Is More Data Better ?
Gobeill, Julien
Ruch, Patrick
Meyer, Rodolphe
DIGITAL PERSONALIZED HEALTH AND MEDICINE, 2020, 270 : 312 - 316
[8] Data mining and machine learning in retail business: developing efficiencies for better customer retention
Kumar, M. Rajesh
Venkatesh, J.
Rahman, A. M. J. Md Zubair
JOURNAL OF AMBIENT INTELLIGENCE AND HUMANIZED COMPUTING, 2021,
[9] Lessons from Archives: Strategies for Collecting Sociocultural Data in Machine Learning
Jo, Eun Seo
Gebru, Timnit
FAT* '20: PROCEEDINGS OF THE 2020 CONFERENCE ON FAIRNESS, ACCOUNTABILITY, AND TRANSPARENCY, 2020, : 306 - 316
[10] Lessons from Archives: Strategies for Collecting Sociocultural Data in Machine Learning
Gebru, Timnit
KDD '20: PROCEEDINGS OF THE 26TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2020, : 3609 - 3609

← 1 2 3 4 →