Effects on Time and Quality of Short Text Clustering during Real-Time Presentations

被引:9
作者
Fuentealba, Diego [1 ]
Lopez, Mario [2 ]
Ponce, Hector [3 ]
机构
[1] Univ Santiago Chile, VirtuaLab, Dept Ind Engn, Santiago, Chile
[2] Univ Santiago Chile, Dept Ind Engn, Santiago, Chile
[3] Univ Santiago Chile, Dept Accounting & Auditing, Fac Adm & Econ, Santiago, Chile
关键词
Silicon compounds; Blogs; Real-time systems; Clustering algorithms; Social networking (online); IEEE transactions; Visualization; Text Mining; TF-IDF; K-Means; Short Phrases; Short Text; Sentences; Clustering; Interactivity; OPTIMIZATION APPROACH; MODEL;
D O I
10.1109/TLA.2021.9475870
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Technologies for live presentations should consider users' capabilities to manage large amounts of data in real-time, particularly, exchanges of short texts (e.g., phrases). This study examines the effects on time and quality of text clustering algorithms applied to short, medium, and long size texts, and examines whether short text clustering shows a reasonable performance for live presentations. We run several simulations in which we varied the number of phrases (from 5 to 200) contained in each text type (long, medium, and short) and the number of generated clusters (from 2 to 10). The algorithms used were snowball steamers, TF-IDF, and K-means for clustering; and the text types were Reuters, 20 NewsGroup and an experimental data set, for the long, medium, and short size texts, respectively. The first result showed that text size had a large effect on the algorithms execution time, with the shortest average time for the short texts and longer average time for the longest texts. The second result showed that the number of phrases in each text type significantly predicts execution time but not the number of clusters generated by K-means. Inertia and purity measures were used to test the quality of the clusters generated. Text size, number of phrases and number of clusters predict inertia; showing the lowest inertia for the short texts. Purity measures were like previously reported results for all text types. Thus, clustering algorithms for short texts can confidently be used in real-time presentations.
引用
收藏
页码:1391 / 1399
页数:9
相关论文
共 50 条
  • [1] EWNStream plus : Effective and Real-time Clustering of Short Text Streams Using Evolutionary Word Relation Network
    Yang, Shuiqiao
    Huang, Guangyan
    Zhou, Xiangmin
    Mak, Vicky
    Yearwood, John
    INTERNATIONAL JOURNAL OF INFORMATION TECHNOLOGY & DECISION MAKING, 2021, 20 (01) : 341 - 370
  • [2] A Real-Time Categorization and Clustering Method for Text Data of Laws and Regulations
    Su, Bianping
    Wang, Rong
    Wang, Yiping
    2010 6TH INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS NETWORKING AND MOBILE COMPUTING (WICOM), 2010,
  • [3] Clustering and Constraints for Real-time Multicast
    Cheng, Wei
    Cheng, Shi
    Wu, Chanle
    Yue, Jun
    Ye, Gang
    He, Lian
    NAS: 2009 IEEE INTERNATIONAL CONFERENCE ON NETWORKING, ARCHITECTURE, AND STORAGE, 2009, : 184 - 187
  • [4] Multigateway Designation for Real-Time TSCH Networks Using Spectral Clustering and Centrality
    Gaitan, Miguel Gutierrez
    Dujovne, Diego
    Zuniga, Julian
    Figueroa, Alejandro
    Almeida, Luis
    IEEE EMBEDDED SYSTEMS LETTERS, 2023, 15 (02) : 97 - 100
  • [5] Real-Time Ukrainian Text Recognition and Voicing
    Tymoshenko, Kateryna
    Vysotska, Victoria
    Kovtun, Oksana
    Holoshchuk, Roman
    Holoshchuk, Svitlana
    COLINS 2021: COMPUTATIONAL LINGUISTICS AND INTELLIGENT SYSTEMS, VOL I, 2021, 2870
  • [6] An Efficient and Distributed Framework for Real-Time Trajectory Stream Clustering
    Gao, Yunjun
    Fang, Ziquan
    Xu, Jiachen
    Gong, Shenghao
    Shen, Chunhui
    Chen, Lu
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2024, 36 (05) : 1857 - 1873
  • [7] Real-Time Fast Channel Clustering for LiDAR Point Cloud
    Zhang, Xiao
    Huang, Xinming
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2022, 69 (10) : 4103 - 4107
  • [8] Real-Time Point Cloud Clustering Algorithm Based on Roadside LiDAR
    Wu, Jianqing
    Zhuang, Xucai
    Tian, Yuan
    Cheng, Zhiheng
    Liu, Shijie
    IEEE SENSORS JOURNAL, 2024, 24 (07) : 10608 - 10619
  • [9] A Real-Time Spike Sorting System Using Parallel OSort Clustering
    Valencia, Daniel
    Alimohammad, Amirhossein
    IEEE TRANSACTIONS ON BIOMEDICAL CIRCUITS AND SYSTEMS, 2019, 13 (06) : 1700 - 1713
  • [10] Real-Time Text Classification of User-Generated Content on Social Media: Systematic Review
    Rogers, David
    Preece, Alun
    Innes, Martin
    Spasic, Irena
    IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, 2022, 9 (04) : 1154 - 1166