A Hybrid Technological Innovation Text Mining, Ensemble Learning and Risk Scorecard Approach for Enterprise Credit Risk Assessment

被引:4
作者
Mao, Yang [1 ]
Liu, Shifeng [1 ]
Gong, Daqing [1 ]
机构
[1] Beijing Jiaotong Univ, Sch Econ & Management, Haidian 100044, Peoples R China
来源
TEHNICKI VJESNIK-TECHNICAL GAZETTE | 2023年 / 30卷 / 06期
关键词
ensemble learning; risk assessment; risk scorecard; technological innovation; text mining;
D O I
10.17559/TV-20230316000447
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
Enterprise credit risk assessment models typically use financial-based information as a predictor variable, relying on backward-looking historical information rather than forward-looking information for risk assessment. We propose a novel hybrid assessment of credit risk that uses technological innovation information as a predictor variable. Text mining techniques are used to extract this information for each enterprise. A combination of random forest and extreme gradient boosting are used for indicator screening, and finally, risk scorecard based on logistic regression is used for credit risk scoring. Our results show that technological innovation indicators obtained through text mining provide valuable information for credit risk assessment, and that the combination of ensemble learning from random forest and extreme gradient boosting combinations with logistic regression models outperforms other traditional methods. The best results achieved 0.9129 area under receiver operating characteristic. In addition, our approach provides meaningful scoring rules for credit risk assessment of technology innovation enterprises.
引用
收藏
页码:1692 / 1703
页数:12
相关论文
共 26 条
[1]   FINANCIAL RATIOS, DISCRIMINANT ANALYSIS AND PREDICTION OF CORPORATE BANKRUPTCY [J].
ALTMAN, EI .
JOURNAL OF FINANCE, 1968, 23 (04) :589-609
[2]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[3]   Making words work: Using financial text as a predictor of financial events [J].
Cecchini, Mark ;
Aytug, Haldun ;
Koehler, Gary J. ;
Pathak, Praveen .
DECISION SUPPORT SYSTEMS, 2010, 50 (01) :164-175
[4]   XGBoost: A Scalable Tree Boosting System [J].
Chen, Tianqi ;
Guestrin, Carlos .
KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, :785-794
[5]   Non-contact screening system based for COVID-19 on XGBoost and logistic regression [J].
Dong, Chunjiao ;
Qiao, Yixian ;
Shang, Chunheng ;
Liao, Xiwen ;
Yuan, Xiaoning ;
Cheng, Qin ;
Li, Yuxuan ;
Zhang, Jianan ;
Wang, Yunfeng ;
Chen, Yahong ;
Ge, Qinggang ;
Bao, Yurong .
COMPUTERS IN BIOLOGY AND MEDICINE, 2022, 141
[6]   Research on risk scorecard of sick building syndrome based on machine learning [J].
Fan, Lingxiao ;
Ding, Yong .
BUILDING AND ENVIRONMENT, 2022, 211
[7]   THE RISKS OF INNOVATION: ARE INNOVATING FIRMS LESS LIKELY TO DIE? [J].
Fernandes, Ana M. ;
Paunov, Caroline .
REVIEW OF ECONOMICS AND STATISTICS, 2015, 97 (03) :638-653
[8]   Spatial dependence in credit risk and its improvement in credit scoring [J].
Fernandes, Guilherme Barreto ;
Artes, Rinaldo .
EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2016, 249 (02) :517-524
[9]   A comparative study of forecasting corporate credit ratings using neural networks, support vector machines, and decision trees [J].
Golbayani, Parisa ;
Florescu, Ionut ;
Chatterjee, Rupak .
NORTH AMERICAN JOURNAL OF ECONOMICS AND FINANCE, 2020, 54
[10]  
Harrell F., 1985, Biostatistics: Statistics in biomedical, public health, and environmental sciences