Hashtags: an essential aspect of topic modeling of city events through social media

被引:1
作者
Kovalchuk, Mikhail [1 ]
Nasonov, Denis [1 ]
机构
[1] ITMO Univ, St Petersburg, Russia
来源
20TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2021) | 2021年
关键词
Topic modeling; Hashtag; Text clustering; Social media; Event detection;
D O I
10.1109/ICMLA52953.2021.00255
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Today, the city is full of digital information, which can be extremely useful in various applications. Instagram, Facebook, VKontakte, and other popular social networks contain a vast amount of valuable data. This information reflects individual stories of people and the background of the city, its events, and current activities in different areas and places of attraction. City events have essential attributes like the time of occurrence, geographical coverage, audience, and often expressed interests or topics. Owning the subject of events, you can solve a whole range of tasks - from individual recommendation systems for leisure activities for citizens and tourists to providing services in the field of food (food trucks) and transport (taxis). To determine the topic (subject) of events, it is necessary to solve two crucial tasks: to identify the events themselves from a variety of city posts and to develop an approach based on modern natural language processing methods for identifying events topics. To determine the events, we suggest an improved algorithm that we had previously developed that integrates time window and area coverage strategy. However, the focus of the work is on the analysis of different approaches to identifying topics, considering the heterogeneity of posts, both in semantic meaning and in size and structure. The focus of this paper is the importance of using post hashtags in various variations to set up more accurate models. In addition, the analysis of features for different language groups was carried out.
引用
收藏
页码:1594 / 1599
页数:6
相关论文
共 25 条
  • [1] Alash Hayder M., 2020, Journal of Physics: Conference Series, V1660, DOI 10.1088/1742-6596/1660/1/012100
  • [2] [Anonymous], 2021, Global social media statistics research summary
  • [3] [Anonymous], 2014, Real-time detection, tracking, and monitoring of automatically discovered events in social media
  • [4] Latent Dirichlet allocation
    Blei, DM
    Ng, AY
    Jordan, MI
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) : 993 - 1022
  • [5] Bojanowski P., 2017, Transactions of the Association for Computational Linguistics, V5, P135, DOI [DOI 10.1162/TACLA00051, 10.1162/tacla00051]
  • [6] Language Modeling by Clustering with Word Embeddings for Text Readability Assessment
    Cha, Miriam
    Gwon, Youngjune
    Kung, H. T.
    [J]. CIKM'17: PROCEEDINGS OF THE 2017 ACM CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2017, : 2003 - 2006
  • [7] Chen M., 2018, SMALLER TEXT CLASSIF, P739, DOI [10.18653/v1/N18-2116, DOI 10.18653/V1/N18-2116]
  • [8] An evaluation of document clustering and topic modelling in two online social networks: Twitter and Reddit
    Curiskis, Stephan A.
    Drake, Barry
    Osborn, Thomas R.
    Kennedy, Paul J.
    [J]. INFORMATION PROCESSING & MANAGEMENT, 2020, 57 (02)
  • [9] Detecting Topics in Documents by Clustering Word Vectors
    de Miranda, Guilherme Raiol
    Pasti, Rodrigo
    de Castro, Leandro Nunes
    [J]. DISTRIBUTED COMPUTING AND ARTIFICIAL INTELLIGENCE, 16TH INTERNATIONAL CONFERENCE, 2020, 1003 : 235 - 243
  • [10] Twitter earthquake detection: earthquake monitoring in a social world
    Earle, Paul S.
    Bowden, Daniel C.
    Guy, Michelle
    [J]. ANNALS OF GEOPHYSICS, 2011, 54 (06) : 708 - 715