Comparative Analysis of Machine Learning Models for Predicting Innovation Outcomes: An Applied AI Approach

被引：1

作者：

Martinovic, Marko ^{[1
]}

Dokic, Kristian ^{[2
]}

Pudic, Dalibor ^{[3
]}

机构：

[1] Univ Slavonski Brod, Tech Dept, Trg Ivane Brlic Mazuranic 2, Slavonski Brod 35000, Croatia

[2] Univ Osijek, Fac Tourism & Rural Dev, Dept Informat & Commun Sci, Vukovarska 17, Pozega 34000, Croatia

[3] Univ North, Dept Business Econ, Ul Jurja Krizan 31b, Varazhdin 42000, Croatia

来源：

APPLIED SCIENCES-BASEL | 2025年 / 15卷 / 07期

关键词：

innovation prediction; machine learning; ensemble methods; cross-validation; classification performance; computational efficiency; Community Innovation Survey; CLASSIFIERS; NETWORKS;

D O I：

10.3390/app15073636

中图分类号：

O6 [化学];

学科分类号：

0703 ;

摘要：

Predicting innovation outcomes at the firm level continues to be an important but challenging goal for researchers and practitioners alike. In this study, multiple machine learning models, encompassing both ensemble-based and single-model approaches, were applied to data from the Community Innovation Survey. Methods included random forests, gradient boosting frameworks, support vector machines, neural networks, and logistic regression, each with hyperparameters optimized through Bayesian search routines and evaluated using corrected cross-validation techniques. The results showed that tree-based boosting algorithms consistently outperformed other models in accuracy, precision, F1-score, and ROC-AUC, while the kernel-based approach excelled in recall. Logistic regression proved to be the most computationally efficient model despite its weaker predictive power. The statistical analyses made it clear that the choice of an appropriate cross-validation protocol and accounting for overlapping data splits are crucial to reduce bias and ensure reliable comparisons. Overall, the results indicate that ensemble methods generally provide robust classification performance for innovation prediction tasks. However, individual models may still prove advantageous under certain metric-specific conditions or computational constraints. These observations emphasize the need to match model selection with data structure, performance objectives, and practical resource constraints when predicting and improving innovation outcomes at the firm level.

引用

页数：44

共 50 条

[1] Combined 5 x 2 cv F test for comparing supervised classification learning algorithms [J].

Alpaydin, E .

NEURAL COMPUTATION, 1999, 11 (08) :1885-1892

[2]

[Anonymous], 2018, Oslo Manual: Guidelines for collecting, reporting and using data on innovation, The measurement of scientific, technological and innovation activities, V4th, DOI [DOI 10.1787/9789264304604-EN, 10.1787/9789264304604-en]

[3]

Bengio Y, 2004, J MACH LEARN RES, V5, P1089

[4]

Bergstra J., 2011, Proceedings of the 24th International Conference on Neural Information Processing Systems, NIPS'11, page, P2546

[5]

Bergstra J., 2013, P 12 PYTHON SCI C

[6]

Bouckaert RR, 2004, LECT NOTES ARTIF INT, V3056, P3

[7]

Boyd CE, 2015, AQUACULTURE, RESOURCE USE, AND THE ENVIRONMENT, P1, DOI 10.1002/9781118857915

[8] Random forests [J].

Breiman, L .

MACHINE LEARNING, 2001, 45 (01) :5-32

[9]

catboost.ai, CatBoost Documentation

[10] XGBoost: A Scalable Tree Boosting System [J].

Chen, Tianqi ;

Guestrin, Carlos .

KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, :785-794

← 1 2 3 4 5 →