Inducing stock market lexicons from disparate Chinese texts

被引:3
作者
Zhao, Futao [1 ]
Yao, Zhong [1 ,2 ]
Luan, Jing [3 ]
Liu, Hao [4 ,5 ]
机构
[1] Beihang Univ, Sch Econ & Management, Beijing, Peoples R China
[2] Beihang Univ, Inst Econ & Business, Beijing, Peoples R China
[3] Beijing Jiaotong Univ, Sch Econ & Management, Beijing, Peoples R China
[4] Northeastern Univ, Sch Business Adm, Shenyang, Liaoning, Peoples R China
[5] Northeastern Univ Qinhuangdao, Qinhuangdao, Hebei, Peoples R China
基金
中国国家自然科学基金;
关键词
Sentiment analysis; Stock market; Sentiment lexicon; SOCIAL MEDIA; SENTIMENT ANALYSIS; MICROBLOGGING DATA; IMPACT; NEWS; CLASSIFICATION; DICTIONARIES; ENGAGEMENT; RESOURCES; TWITTER;
D O I
10.1108/IMDS-04-2019-0254
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Purpose The purpose of this paper is to propose a methodology to construct a stock market sentiment lexicon by incorporating domain-specific knowledge extracted from diverse Chinese media outlets. Design/methodology/approach This paper presents a novel method to automatically generate financial lexicons using a unique data set that comprises news articles, analyst reports and social media. Specifically, a novel method based on keyword extraction is used to build a high-quality seed lexicon and an ensemble mechanism is developed to integrate the knowledge derived from distinct language sources. Meanwhile, two different methods, Pointwise Mutual Information and Word2vec, are applied to capture word associations. Finally, an evaluation procedure is performed to validate the effectiveness of the method compared with four traditional lexicons. Findings The experimental results from the three real-world testing data sets show that the ensemble lexicons can significantly improve sentiment classification performance compared with the four baseline lexicons, suggesting the usefulness of leveraging knowledge derived from diverse media in domain-specific lexicon generation and corresponding sentiment analysis tasks. Originality/value This work appears to be the first to construct financial sentiment lexicons from over 2m posts and headlines collected from more than one language source. Furthermore, the authors believe that the data set established in this study is one of the largest corpora used for Chinese stock market lexicon acquisition. This work is valuable to extract collective sentiment from multiple media sources and provide decision-making support for stock market participants.
引用
收藏
页码:508 / 525
页数:18
相关论文
共 61 条
[1]   Stock market response to information diffusion through internet sources: A literature review [J].
Agarwal, Shweta ;
Kumar, Shailendra ;
Goel, Utkarsh .
INTERNATIONAL JOURNAL OF INFORMATION MANAGEMENT, 2019, 45 :118-131
[2]  
Al-Twairesh N, 2016, PROCEEDINGS OF THE 54TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1, P697
[3]  
[Anonymous], J BASIC APPL SCI RES
[4]  
[Anonymous], 2012, Mining text data, DOI DOI 10.1007/978-1-4614-3223-4_13
[5]  
Bo Pang, 2008, Foundations and Trends in Information Retrieval, V2, P1, DOI 10.1561/1500000001
[6]  
Chau M, 2012, MIS QUART, V36, P1189
[7]   Data-driven integration of multiple sentiment dictionaries for lexicon-based sentiment classification of product reviews [J].
Cho, Heeryon ;
Kim, Songkuk ;
Lee, Jongseo ;
Lee, Jong-Seok .
KNOWLEDGE-BASED SYSTEMS, 2014, 71 :61-71
[8]  
Church K.W., 1990, WORD ASS NORMS MUTUA, V16, P22
[9]   From Networked Nominee to Networked Nation: Examining the Impact of Web 2.0 and Social Media on Political Participation and Civic Engagement in the 2008 Obama Campaign [J].
Cogburn, Derrick L. ;
Espinoza-Vasquez, Fatima K. .
JOURNAL OF POLITICAL MARKETING, 2011, 10 (1-2) :189-213
[10]   Adapting sentiment lexicons to domain-specific social media texts [J].
Deng, Shuyuan ;
Sinha, Atish P. ;
Zhao, Huimin .
DECISION SUPPORT SYSTEMS, 2017, 94 :65-76