Machines Learn Better with Better Data Ontology: Lessons from Philosophy of Induction and Machine Learning Practice

被引：3

作者：

Li, Dan ^{[1
]}

机构：

[1] CUNY, Baruch Coll, Philosophy Dept, New York, NY 10031 USA

来源：

MINDS AND MACHINES | 2023年 / 33卷 / 03期

关键词：

Induction; Machine learning; Data ontology; No Free Lunch theorem; Goodman's riddle of induction; CLIMATE; MODELS;

D O I：

10.1007/s11023-023-09639-9

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

As scientists start to adopt machine learning (ML) as one research tool, the security of ML and the knowledge generated become a concern. In this paper, I explain how supervised ML can be improved with better data ontology, or the way we make categories and turn information into data. More specifically, we should design data ontology in such a way that is consistent with the knowledge that we have about the target phenomenon so that such ontology can help us make the inductive leap. I do so by thinking through a thought experiment, Goodman's New Riddle of Induction (Fact, fiction, and forecast, Harvard University Press, 1955). Goodman's riddle helps flesh out three problems of induction: (1) the problem of equal goodies, that there are often too many equally good inductive results given the same data; (2) the problem of diverging performance, that these equally good results can give opposite predictions in the future; and (3) the problem of mediocrity, that when averaged across all equally possible datasets and tasks, no inductive algorithm outperforms any other. I show that all these three problems are manifested as real obstacles in ML practice, namely, the Rashomon effect (Breiman in Stat Sci 16(3):199-231, 2001), the problem of underspecification (D'Amour et al. in J Mach Learn Res, 2020, https://doi.org/10.48550/arXiv.2011.03395), and the No Free Lunch theorem (Wolpert in Neural Comput 8(7):1341-90, 1996, https://doi.org/10.1162/neco.1996.8.7. 1341). Lastly, I argue that proper data ontology can help mitigate these problems and I demonstrate how using concrete examples from climate science. This research highlights the links between philosophers' discussions of induction and implications in ML practice.

引用

页码：429 / 450

页数：22

共 35 条

[31] A Data-Driven Approach to Improve Cocoa Crop Establishment in Colombia: Insights and Agricultural Practice Recommendations from an Ensemble Machine Learning Model
Talero-Sarmiento, Leonardo
Roa-Prada, Sebastian
Caicedo-Chacon, Luz
Gavanzo-Cardenas, Oscar
AGRIENGINEERING, 2025, 7 (01):
[32] Rebuilding high-quality near-surface ozone data based on the combination of WRF-Chem model with a machine learning method to better estimate its impact on crop yields in the Beijing-Tianjin-Hebei region from 2014 to 2019
Han, Tian
Hu, Xiaomin
Zhang, Jing
Xue, Wenhao
Che, Yunfei
Deng, Xiaoqing
Zhou, Lihua
ENVIRONMENTAL POLLUTION, 2023, 336
[33] Better efficacy in differentiating WHO grade II from III oligodendrogliomas with machine-learning than radiologist’s reading from conventional T1 contrast-enhanced and fluid attenuated inversion recovery images
Sha-Sha Zhao
Xiu-Long Feng
Yu-Chuan Hu
Yu Han
Qiang Tian
Ying-Zhi Sun
Jie Zhang
Xiang-Wei Ge
Si-Chao Cheng
Xiu-Li Li
Li Mao
Shu-Ning Shen
Lin-Feng Yan
Guang-Bin Cui
Wen Wang
BMC Neurology, 20
[34] Better efficacy in differentiating WHO grade II from III oligodendrogliomas with machine-learning than radiologist's reading from conventional T1 contrast-enhanced and fluid attenuated inversion recovery images
Zhao, Sha-Sha
Feng, Xiu-Long
Hu, Yu-Chuan
Han, Yu
Tian, Qiang
Sun, Ying-Zhi
Zhang, Jie
Ge, Xiang-Wei
Cheng, Si-Chao
Li, Xiu-Li
Mao, Li
Shen, Shu-Ning
Yan, Lin-Feng
Cui, Guang-Bin
Wang, Wen
BMC NEUROLOGY, 2020, 20 (01)
[35] Editor's Roundup: Known Knowns, Known Unknowns, and Unknown Unknowns in ECT--Due Diligence and Preparation Are Sine Qua Nons of Practice; Machine Learning to Refine and Inform ECT Practice; the CARE Network Helps Drive Better Understanding of Treatment Variation to Improve Outcomes, Practice, Education, and Policy, Among Other Uses; Advocacy for ECT in Guidelines and in the Arts-A Reminder of Our Role
Espinoza, Randall T.
JOURNAL OF ECT, 2024, 40 (04) : 221 - 222

← 1 2 3 4 →