Effects on Time and Quality of Short Text Clustering during Real-Time Presentations

被引:9
作者
Fuentealba, Diego [1 ]
Lopez, Mario [2 ]
Ponce, Hector [3 ]
机构
[1] Univ Santiago Chile, VirtuaLab, Dept Ind Engn, Santiago, Chile
[2] Univ Santiago Chile, Dept Ind Engn, Santiago, Chile
[3] Univ Santiago Chile, Dept Accounting & Auditing, Fac Adm & Econ, Santiago, Chile
关键词
Silicon compounds; Blogs; Real-time systems; Clustering algorithms; Social networking (online); IEEE transactions; Visualization; Text Mining; TF-IDF; K-Means; Short Phrases; Short Text; Sentences; Clustering; Interactivity; OPTIMIZATION APPROACH; MODEL;
D O I
10.1109/TLA.2021.9475870
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Technologies for live presentations should consider users' capabilities to manage large amounts of data in real-time, particularly, exchanges of short texts (e.g., phrases). This study examines the effects on time and quality of text clustering algorithms applied to short, medium, and long size texts, and examines whether short text clustering shows a reasonable performance for live presentations. We run several simulations in which we varied the number of phrases (from 5 to 200) contained in each text type (long, medium, and short) and the number of generated clusters (from 2 to 10). The algorithms used were snowball steamers, TF-IDF, and K-means for clustering; and the text types were Reuters, 20 NewsGroup and an experimental data set, for the long, medium, and short size texts, respectively. The first result showed that text size had a large effect on the algorithms execution time, with the shortest average time for the short texts and longer average time for the longest texts. The second result showed that the number of phrases in each text type significantly predicts execution time but not the number of clusters generated by K-means. Inertia and purity measures were used to test the quality of the clusters generated. Text size, number of phrases and number of clusters predict inertia; showing the lowest inertia for the short texts. Purity measures were like previously reported results for all text types. Thus, clustering algorithms for short texts can confidently be used in real-time presentations.
引用
收藏
页码:1391 / 1399
页数:9
相关论文
共 50 条
  • [21] Comparison of Real-Time and Batch Job Recommendations
    Kwiecinski, Robert
    Melniczak, Grzegorz
    Gorecki, Tomasz
    IEEE ACCESS, 2023, 11 : 20553 - 20559
  • [22] STELA: A Real-Time Scene Text Detector With Learned Anchor
    Deng, Linjie
    Gong, Yanxiang
    Lu, Xinchen
    Lin, Yi
    Ma, Zheng
    Xie, Mei
    IEEE ACCESS, 2019, 7 : 153400 - 153407
  • [23] Real-Time Clustering for Large Sparse Online Visitor Data
    Chan, Gromit Yeuk-Yin
    Du, Fan
    Rossi, Ryan A.
    Rao, Anup B.
    Koh, Eunyee
    Silva, Claudio T.
    Freire, Juliana
    WEB CONFERENCE 2020: PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE (WWW 2020), 2020, : 1049 - 1059
  • [24] Reinforcement Learning with Sequential Information Clustering in Real-Time Bidding
    Lu, Junwei
    Yang, Chaoqi
    Gao, Xiaofeng
    Wang, Liubin
    Li, Changcheng
    Chen, Guihai
    PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT (CIKM '19), 2019, : 1633 - 1641
  • [25] Nodes' Clustering in WDM Star Networks with Real-Time Traffic
    Petridou, S. G.
    Sarigiannidis, P. G.
    Papadimitriou, G. I.
    Pomportsis, A. S.
    2008 IEEE MEDITERRANEAN ELECTROTECHNICAL CONFERENCE, VOLS 1 AND 2, 2008, : 246 - 251
  • [26] Optimising Real-time Performance of Genetic Algorithm Clustering Method
    Khairir, Muhammad Ihsan
    Nopiah, Zulkifli Mohd
    Abdullah, Shahrum
    Baharin, Mohd Noor
    FRACTURE AND STRENGTH OF SOLIDS VII, PTS 1 AND 2, 2011, 462-463 : 223 - 229
  • [27] Real-Time MEG Source Localization Using Regional Clustering
    Dinh, Christoph
    Strohmeier, Daniel
    Luessi, Martin
    Guellmar, Daniel
    Baumgarten, Daniel
    Haueisen, Jens
    Haemaelaeinen, Matti S.
    BRAIN TOPOGRAPHY, 2015, 28 (06) : 771 - 784
  • [28] Adaptive Real-Time Clustering Algorithm with Resource-Aware
    Wang, Xiaoni
    LISS 2014, 2015, : 1635 - 1639
  • [29] Real-time fetal heart monitoring in biomagnetic measurements using adaptive real-time ICA
    Waldert, Stephan
    Bensch, Michael
    Bogdan, Martin
    Rosenstiel, Wolfgang
    Schoelkopf, Bernhard
    Lowery, Curtis L.
    Eswaran, Hari
    Preissl, Hubert
    IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, 2007, 54 (10) : 1867 - 1874
  • [30] Extracting and Visualizing Insights from Real-Time Conversations Around Public Presentations
    Belmonte, Nicolas Garcia
    2014 IEEE CONFERENCE ON VISUAL ANALYTICS SCIENCE AND TECHNOLOGY (VAST), 2014, : 225 - 226