SocialTERM-Extractor: Identifying and Predicting Social-Problem-Specific Key Noun Terms from a Large Number of Online News Articles Using Text Mining and Machine Learning Techniques

被引:8
作者
Suh, Jong Hwan [1 ]
机构
[1] Gyeongsang Natl Univ, BERI, Dept Management Informat Syst, 501 Jinjudae Ro, Jinju Si 52828, Gyeongsangnam D, South Korea
基金
新加坡国家研究基金会;
关键词
social-problem-specific key noun terms; temporal weights; sentiment analysis; complex network structure analysis; deep learning; ensemble learning methods; SENTIMENT CLASSIFICATION; INFORMATION; COMMUNITIES; TECHNOLOGY; DIVERSITY; STRATEGY; NETWORKS; PATTERNS; SCIENCE; SYSTEMS;
D O I
10.3390/su11010196
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
In the digital age, the abundant unstructured data on the Internet, particularly online news articles, provide opportunities for identifying social problems and understanding social systems for sustainability. However, the previous works have not paid attention to the social-problem-specific perspectives of such big data, and it is currently unclear how information technologies can use the big data to identify and manage the ongoing social problems. In this context, this paper introduces and focuses on social-problem-specific key noun terms, namely SocialTERMs, which can be used not only to search the Internet for social-problem-related data, but also to monitor the ongoing and future events of social problems. Moreover, to alleviate time-consuming human efforts in identifying the SocialTERMs, this paper designs and examines the SocialTERM-Extractor, which is an automatic approach for identifying the key noun terms of social-problem-related topics, namely SPRTs, in a large number of online news articles and predicting the SocialTERMs among the identified key noun terms. This paper has its novelty as the first trial to identify and predict the SocialTERMs from a large number of online news articles, and it contributes to literature by proposing three types of text-mining-based features, namely temporal weight, sentiment, and complex network structural features, and by comparing the performances of such features with various machine learning techniques including deep learning. Particularly, when applied to a large number of online news articles that had been published in South Korea over a 12-month period and mostly written in Korean, the experimental results showed that Boosting Decision Tree gave the best performances with the full feature sets. They showed that the SocialTERMs can be predicted with high performances by the proposed SocialTERM-Extractor. Eventually, this paper can be beneficial for individuals or organizations who want to explore and use social-problem-related data in a systematical manner for understanding and managing social problems even though they are unfamiliar with ongoing social problems.
引用
收藏
页数:44
相关论文
共 86 条
  • [31] Opinion mining from online hotel reviews - A text summarization approach
    Hu, Ya-Han
    Chen, Yen-Liang
    Chou, Hui-Ling
    [J]. INFORMATION PROCESSING & MANAGEMENT, 2017, 53 (02) : 436 - 449
  • [32] Cross-Lingual Document Representation and Semantic Similarity Measure: A Fuzzy Set and Rough Set Based Approach
    Huang, Hsun-Hui
    Kuo, Yau-Hwang
    [J]. IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2010, 18 (06) : 1098 - 1111
  • [33] Finding linkage between technology and social issue: A Literature Based Discovery approach
    Ittipanuvat, Vitavin
    Fujita, Katsuhide
    Sakata, Ichiro
    Kajikawa, Yuya
    [J]. JOURNAL OF ENGINEERING AND TECHNOLOGY MANAGEMENT, 2014, 32 : 160 - 184
  • [34] Generic method for detecting focus time of documents
    Jatowt, Adam
    Yeung, Ching Man Au
    Tanaka, Katsumi
    [J]. INFORMATION PROCESSING & MANAGEMENT, 2015, 51 (06) : 851 - 868
  • [35] Twitter sentiment classification for measuring public health concerns
    Ji, Xiang
    Chun, Soon Ae
    Wei, Zhi
    Geller, James
    [J]. SOCIAL NETWORK ANALYSIS AND MINING, 2015, 5 (01) : 1 - 25
  • [36] Analyzing firm-specific social media and market: A stakeholder-based event analysis framework
    Jiang, Shan
    Chen, Hsinchun
    Nunamaker, Jay F.
    Zimbra, David
    [J]. DECISION SUPPORT SYSTEMS, 2014, 67 : 30 - 39
  • [37] Analyzing future communities in growing citation networks
    Jung, Sukhwan
    Segev, Aviv
    [J]. KNOWLEDGE-BASED SYSTEMS, 2014, 69 : 34 - 44
  • [38] Text Mining Self-Disclosing Health Information for Public Health Service
    Ku, Yungchang
    Chiu, Chaochang
    Zhang, Yulei
    Chen, Hsinchun
    Su, Handsome
    [J]. JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY, 2014, 65 (05) : 928 - 947
  • [39] Technology opportunity identification customized to the technological capability of SMEs through two-stage patent analysis
    Lee, Yongho
    Kim, So Young
    Song, Inseok
    Park, Yongtae
    Shin, Juneseuk
    [J]. SCIENTOMETRICS, 2014, 100 (01) : 227 - 244
  • [40] Exploring the diversity of retweeting behavior patterns in Chinese microblogging platform
    Li, Qianqian
    Liu, Yijun
    [J]. INFORMATION PROCESSING & MANAGEMENT, 2017, 53 (04) : 945 - 962