An effective short text conceptualization based on new short text similarity

被引:6
|
作者
Bekkali, Mohammed [1 ]
Lachkar, Abdelmonaime [2 ]
机构
[1] USMBA, ENSA, LISA Lab, Fes, Morocco
[2] AEU, ENSA, Tangier, Morocco
关键词
Arabic language; Conceptualization; Word sense disambiguation; Short text similarity; Rough set theory;
D O I
10.1007/s13278-018-0544-8
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Recently short text messages, tweets, comments and so on, have become a large portion of the online text data. They are limited in length and different from traditional documents in their shortness and sparseness. As a result, short text tends to be ambiguous and its degree is not the same for all languages; and as Arabic is a very high flexional language, where a single word can have multiple meanings, the short text representation plays a vital role in any Text Mining task. To address these issues, we propose an efficient representation for short text based on concepts instead of terms using BabelNet as an external knowledge. However, in the conceptualization process, while searching polysemic term-corresponding concepts, multiple matches are detected. Therefore, assigning a term to a concept is a crucial step and we believe that short text similarity can be useful to overcome the problem of mapping term to the corresponding concept. In this paper, we reintroduce Web-based Kernel function for measuring the semantic relatedness between concepts to disambiguate an expression versus multiple concepts. The proposed method has been evaluated using an Arabic short text categorization system and the obtained results illustrate the interest of our contribution.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] Short text similarity based on probabilistic topics
    Xiaojun Quan
    Gang Liu
    Zhi Lu
    Xingliang Ni
    Liu Wenyin
    Knowledge and Information Systems, 2010, 25 : 473 - 491
  • [2] Short text similarity based on probabilistic topics
    Quan, Xiaojun
    Liu, Gang
    Lu, Zhi
    Ni, Xingliang
    Wenyin, Liu
    KNOWLEDGE AND INFORMATION SYSTEMS, 2010, 25 (03) : 473 - 491
  • [3] Text Similarity Function Based on Word Embeddings for Short Text Analysis
    Pascual, Adrian Jimenez
    Fujita, Sumio
    COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING (CICLING 2017), PT I, 2018, 10761 : 391 - 402
  • [4] Short Text Understanding Combining Text Conceptualization and Transformer Embedding
    Li, Jun
    Huang, Guimin
    Chen, Jianheng
    Wang, Yabing
    IEEE ACCESS, 2019, 7 : 122183 - 122191
  • [5] Leveraging Conceptualization for Short-Text Embedding
    Huang, Heyan
    Wang, Yashen
    Feng, Chong
    Liu, Zhirun
    Zhou, Qiang
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2018, 30 (07) : 1282 - 1295
  • [6] An algorithm for semantic similarity of short text based on WordNet
    Zhai, Yan-Dong
    Wang, Kang-Ping
    Zhang, Dong-Na
    Hunag, Lan
    Zhou, Chun-Guang
    Tien Tzu Hsueh Pao/Acta Electronica Sinica, 2012, 40 (03): : 617 - 620
  • [7] Short Text Computing Based on Lexical Similarity Model
    Alhadi, Arifah Che
    Deraman, Aziz
    Jalil, Masita Masila Abdul
    Yussof, Wan Nural Jawahir Wan
    Noah, Shahrul Azman Mohd
    INFORMATION AND SOFTWARE TECHNOLOGIES, ICIST 2019, 2019, 1078 : 355 - 366
  • [8] A Short Text Similarity Measure Based on Hidden Topics
    Chen, Hong-chao
    Guo, Xiao-hua
    Liu, Ling-qiang
    Zhu, Xin-hua
    COMPUTER SCIENCE AND TECHNOLOGY (CST2016), 2017, : 1101 - 1108
  • [9] Similarity measures for short segments of text
    Metzler, Donald
    Dumais, Susan
    Meek, Christopher
    ADVANCES IN INFORMATION RETRIEVAL, 2007, 4425 : 16 - +
  • [10] Benchmarking short text semantic similarity
    O'Shea J.
    Bandar Z.
    Crockett K.
    McLean D.
    International Journal of Intelligent Information and Database Systems, 2010, 4 (02) : 103 - 120