Text Mining and Comparative Visual Analytics on Large Collection of Speeches to Trace Socio-Political Issues

被引:0
作者
Katre, Paritosh D. [1 ]
机构
[1] Vishwakarma Inst Informat Technol, Dept Comp Engn, Pune, Maharashtra, India
来源
PROCEEDINGS OF THE 2019 IEEE 9TH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTING (IACC 2019) | 2019年
关键词
latent dirichlet allocation (LDA); natural language processing (NLP); comparative visual analytics; data science; social science; big-data;
D O I
10.1109/iacc48062.2019.8971605
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
We present an experimental study of implementing Latent Dirichlet Allocation (LDA) and Comparative Visual Analytics to trace socio-political issues highlighted within large corpora of political speech transcripts. In this experiment, over 500 speech transcripts are scraped by building scrapers to analyze this big-data of transcripts and derive insights from it. Based on LDA topic modelling algorithm, latent "topics", referred as issues in this paper, were discovered from the speech transcripts and visualized using 'pyLDAvis', which is an interactive visualization tool used upon LDA Model results. Along with LDA, graphical visualizations were generated such as Lexical Dispersion Plots and 'Topic Bar Plots using Matplotlib library of Python. Within comparative analytics, visual graphs were generated for speeches by two different candidates and juxtaposed to compare and interpret their discourse. Linguists have performed Political Discourse Analysis (PDA) using manual approaches but analyzing such a large volume of speeches is practically time consuming and extremely complex. Our experiment which focuses on identifying socio-political issues within speech transcripts using NLP based text analytics proves to be a beneficial technique for understanding Political Discourse Analysis (PDA).
引用
收藏
页码:108 / 114
页数:7
相关论文
共 25 条
[1]  
Ayeomoni O. M., 2012, THEORY PRACTICE LANG, V2
[2]  
Bensrhir A., 2013, ACS INT C COMP SYST, P1
[3]  
Biadsy F., 2007, INTERSPEECH
[4]  
Bird S., 2009, Natural language processing with Python: analyzing text with the natural language toolkit
[5]   Latent Dirichlet allocation [J].
Blei, DM ;
Ng, AY ;
Jordan, MI .
JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) :993-1022
[6]  
Chang J., 2009, Advances in Neural Information Processing Systems, V22, P1
[7]  
Gregory SW, 2002, SOC PSYCHOL QUART, V65, P298
[8]  
Hellmann D., 2011, The Python Standard Library by Example
[9]   Probabilistic latent semantic indexing [J].
Hofmann, T .
SIGIR'99: PROCEEDINGS OF 22ND INTERNATIONAL CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 1999, :50-57
[10]   QUANTITATIVE ANALYSIS OF LARGE AMOUNTS OF JOURNALISTIC TEXTS USING TOPIC MODELLING [J].
Jacobi, Carina ;
van Atteveldt, Wouter ;
Welbers, Kasper .
DIGITAL JOURNALISM, 2016, 4 (01) :89-106