How Much Can Machines Learn Finance from Chinese Text Data?

被引:6
|
作者
Zhou, Yang [1 ,2 ]
Fan, Jianqing [3 ,4 ,5 ]
Xue, Lirong [4 ]
机构
[1] Fudan Univ, Inst Big Data, Shanghai 200433, Peoples R China
[2] Fudan Univ, MOE Lab Natl Dev & Intelligent Governance, Shanghai 200433, Peoples R China
[3] Capital Univ Econ & Business, Int Sch Econ & Management, Beijing 100070, Peoples R China
[4] Princeton Univ, Dept Operat Res & Financial Engn, Princeton, NJ 08544 USA
[5] Fudan Univ, Sch Data Sci, Shanghai 200433, Peoples R China
基金
中国国家自然科学基金;
关键词
machine learning; FarmPredict; factor model; sparse regression; textual analysis; INVESTOR SENTIMENT; STOCK; RETURNS; NUMBER; RISK; NEWS;
D O I
10.1287/mnsc.2022.01468
中图分类号
C93 [管理学];
学科分类号
12 ; 1201 ; 1202 ; 120202 ;
摘要
How much can we learn finance directly from text data? This paper presents a new framework for learning textual data based on the factor augmentation model and sparsity regularization, called the factor -augmented regularized model for prediction (FarmPredict), to let machines learn financial returns directly from news. FarmPredict allows the model itself to extract information directly from articles without predefined information, such as dictionaries or pretrained models as in most studies. Using unsupervised learned factors to augment the predictors would benefit our method with a "doublerobust" feature: that the machine would learn to balance between individual words or text factors/topics. It also avoids the information loss of factor regression in dimensionality reduction. We apply our model to the Chinese stock market with a large proportion of retail investors by using Chinese news data to predict financial returns. We show that positive sentiments scored by our FarmPredict approach from news generate on average 83 basic points (bps) stock daily excess returns, and negative news has an adverse impact of 26 bps on the days of news announcements, where both effects can last for a few days. This asymmetric effect aligns well with the short -sale constraints in the Chinese equity market. The result shows that the machine -learned prediction does provide sizeable predictive power with an annualized return of 54% at most with a simple investment strategy. Compared with other statistical and machine learning methods, FarmPredict significantly outperforms them on model prediction and portfolio performance. Our study demonstrates the of machines to learn text data.
引用
收藏
页码:8962 / 8987
页数:27
相关论文
共 34 条
  • [21] How and What Can Humans Learn from Being in the Loop? Invoking Contradiction Learning as a Measure to Make Humans Smarter
    Abdel-Karim, Benjamin M.
    Pfeuffer, Nicolas
    Rohde, Gernot
    Hinz, Oliver
    KUNSTLICHE INTELLIGENZ, 2020, 34 (02): : 199 - 207
  • [22] Hacking smart machines with smarter ones: How to extract meaningful data from machine learning classifiers
    Ateniese, Giuseppe
    Mancini, Luigi V.
    Spognardi, Angelo
    Villani, Antonio
    Vitali, Domenico
    Felici, Giovanni
    International Journal of Security and Networks, 2015, 10 (03) : 137 - 150
  • [23] How Much Is the Eco-Efficiency of Agricultural Production in West China? Evidence from the Village Level Data
    Xiang, Hui
    Wang, Ya Hui
    Huang, Qi Qi
    Yang, Qing Yuan
    INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH, 2020, 17 (11) : 1 - 15
  • [24] What Can We Learn About Drug Safety and Other Effects in the Era of Electronic Health Records and Big Data That We Would Not Be Able to Learn From Classic Epidemiology?
    Zarrinpar, Ali
    Cheng, Ting-Yuan David
    Huo, Zhiguang
    JOURNAL OF SURGICAL RESEARCH, 2020, 246 : 599 - 604
  • [25] CAN MACHINE LEARNING ALGORITHMS ASSOCIATED WITH TEXT MINING FROM INTERNET DATA IMPROVE HOUSING PRICE PREDICTION PERFORMANCE?
    Guo, Jian-qiang
    Chiang, Shu-hen
    Liu, Min
    Yang, Chi-Chun
    Gou, Kai-yi
    INTERNATIONAL JOURNAL OF STRATEGIC PROPERTY MANAGEMENT, 2020, 24 (05) : 300 - 312
  • [26] On the utility of dreaming: A general model for how learning in artificial agents can benefit from data hallucination
    Windridge, David
    Svensson, Henrik
    Thill, Serge
    ADAPTIVE BEHAVIOR, 2021, 29 (03) : 267 - 280
  • [27] How Do the Global Stock Markets Influence One Another? Evidence from Finance Big Data and Granger Causality Directed Network
    Tang, Yong
    Xiong, Jason Jie
    Luo, Yong
    Zhang, Yi-Cheng
    INTERNATIONAL JOURNAL OF ELECTRONIC COMMERCE, 2019, 23 (01) : 85 - 109
  • [28] How much can we save by applying artificial intelligence in evidence synthesis? Results from a pragmatic review to quantify workload efficiencies and cost savings
    Abogunrin, Seye
    Muir, Jeffrey M.
    Zerbini, Clarissa
    Sarri, Grammati
    FRONTIERS IN PHARMACOLOGY, 2025, 16
  • [29] How can gender be identified from heart rate data? Evaluation using ALLSTAR heart rate variability big data analysis
    Itaru Kaneko
    Junichiro Hayano
    Emi Yuda
    BMC Research Notes, 16
  • [30] How can gender be identified from heart rate data? Evaluation using ALLSTAR heart rate variability big data analysis
    Kaneko, Itaru
    Hayano, Junichiro
    Yuda, Emi
    BMC RESEARCH NOTES, 2023, 16 (01)