Effects on Time and Quality of Short Text Clustering during Real-Time Presentations

被引:9
作者
Fuentealba, Diego [1 ]
Lopez, Mario [2 ]
Ponce, Hector [3 ]
机构
[1] Univ Santiago Chile, VirtuaLab, Dept Ind Engn, Santiago, Chile
[2] Univ Santiago Chile, Dept Ind Engn, Santiago, Chile
[3] Univ Santiago Chile, Dept Accounting & Auditing, Fac Adm & Econ, Santiago, Chile
关键词
Silicon compounds; Blogs; Real-time systems; Clustering algorithms; Social networking (online); IEEE transactions; Visualization; Text Mining; TF-IDF; K-Means; Short Phrases; Short Text; Sentences; Clustering; Interactivity; OPTIMIZATION APPROACH; MODEL;
D O I
10.1109/TLA.2021.9475870
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Technologies for live presentations should consider users' capabilities to manage large amounts of data in real-time, particularly, exchanges of short texts (e.g., phrases). This study examines the effects on time and quality of text clustering algorithms applied to short, medium, and long size texts, and examines whether short text clustering shows a reasonable performance for live presentations. We run several simulations in which we varied the number of phrases (from 5 to 200) contained in each text type (long, medium, and short) and the number of generated clusters (from 2 to 10). The algorithms used were snowball steamers, TF-IDF, and K-means for clustering; and the text types were Reuters, 20 NewsGroup and an experimental data set, for the long, medium, and short size texts, respectively. The first result showed that text size had a large effect on the algorithms execution time, with the shortest average time for the short texts and longer average time for the longest texts. The second result showed that the number of phrases in each text type significantly predicts execution time but not the number of clusters generated by K-means. Inertia and purity measures were used to test the quality of the clusters generated. Text size, number of phrases and number of clusters predict inertia; showing the lowest inertia for the short texts. Purity measures were like previously reported results for all text types. Thus, clustering algorithms for short texts can confidently be used in real-time presentations.
引用
收藏
页码:1391 / 1399
页数:9
相关论文
共 50 条
  • [41] Real-time approach for cloth simulation
    Shen, Yu-Ju
    Wang, Ming-Shi
    JOURNAL OF REAL-TIME IMAGE PROCESSING, 2007, 1 (04) : 303 - 310
  • [42] Real-time specifications
    David, Alexandre
    Larsen, Kim G.
    Legay, Axel
    Nyman, Ulrik
    Traonouez, Louis-Marie
    Wasowski, Andrzej
    INTERNATIONAL JOURNAL ON SOFTWARE TOOLS FOR TECHNOLOGY TRANSFER, 2015, 17 (01) : 17 - 45
  • [43] Real-time specifications
    Alexandre David
    Kim G. Larsen
    Axel Legay
    Ulrik Nyman
    Louis-Marie Traonouez
    Andrzej Wąsowski
    International Journal on Software Tools for Technology Transfer, 2015, 17 : 17 - 45
  • [44] Real-Time Monitoring the Effects of Storage Conditions on Volatile Compounds and Quality Indexes of Halal-Certified Kimchi during Distribution Using Electronic Nose
    Laksana, Andri Jaya
    Choi, Young-Min
    Kim, Jong-Hoon
    Kim, Byeong-Sam
    Kim, Ji-Young
    FOODS, 2022, 11 (15)
  • [45] Real-Time Edge Processing During Data Acquisition
    Rietmann, Max
    Nakshatrala, Praveen
    Lefman, Jonathan
    Gupta, Geetika
    ACCELERATING SCIENCE AND ENGINEERING DISCOVERIES THROUGH INTEGRATED RESEARCH INFRASTRUCTURE FOR EXPERIMENT, BIG DATA, MODELING AND SIMULATION, SMC 202, 2022, 1690 : 191 - 205
  • [46] Electric Loads as Real-Time tasks: an application of Real-Time Physical Systems
    Della Vedova, Marco L.
    di Palma, Ettore
    Facchinetti, Tullio
    2011 7TH INTERNATIONAL WIRELESS COMMUNICATIONS AND MOBILE COMPUTING CONFERENCE (IWCMC), 2011, : 1117 - 1123
  • [47] Quality requirements for real-time safety-critical systems
    Kirner, TG
    CONTROL ENGINEERING PRACTICE, 1997, 5 (07) : 965 - 973
  • [48] Statistical Methods to Improve the Quality of Real-Time Drilling Data
    Al-Gharbi, Salem
    Al-Majed, Abdulaziz
    Abdulraheem, Abdulazeez
    Tariq, Zeeshan
    Mahmoud, Mohamed
    JOURNAL OF ENERGY RESOURCES TECHNOLOGY-TRANSACTIONS OF THE ASME, 2022, 144 (09):
  • [49] Remote ultrasound real-time consultation and quality control system
    Zhang, Yong
    Luo, Yan
    Qiu, Li
    Lu, Qiang
    Lu, Xiao
    INTERNATIONAL JOURNAL OF IMAGING SYSTEMS AND TECHNOLOGY, 2024, 34 (01)
  • [50] FUZZY-LOGIC AND DATA QUALITY IN REAL-TIME SYSTEMS
    MATIA, F
    AGUILARCRESPO, JA
    JIMENEZ, A
    SANZ, R
    DOMINGUEZ, JM
    INTEGRATED COMPUTER-AIDED ENGINEERING, 1995, 2 (03) : 229 - 239