A New Word Clustering Algorithm Based on Word Similarity

被引：0

作者：

YUAN Lichi ^{[1
]}

机构：

[1] School of Information Technology, Jiangxi University of Finance and Economics

来源：

Chinese Journal of Electronics | 2017年 / 26卷 / 06期

基金：

中国国家自然科学基金;

关键词：

Word similarity; Word clustering; Statistical language model;

D O I：

暂无

中图分类号：

TP391.1 [文字信息处理];

学科分类号：

081203 ; 0835 ;

摘要：

Category-based statistic language model is an important method to solve the problem of sparse data in statistical language models. But there are two bottlenecks about this model: 1) The problem of word clustering, it is hard to find a suitable clustering method that has good performance and has not large amount of computation; 2)Class-based method always loses some prediction ability to adapt the text of different domain. In order to solve above problems, a novel definition of word similarity by utilizing mutual information was presented. Based on word similarity, the definition of word set similarity was given and a bottom-up hierarchical clustering algorithm was proposed.Experimental results show that the word clustering algorithm based on word similarity is better than conventional greedy clustering method in speed and performance, the perplexity is reduced from 283 to 207.8.

引用

页码：1221 / 1226

页数：6

共 50 条

[21] WEWD: A Combined Approach for Measuring Cross-lingual Semantic Word Similarity Based on Word Embeddings and Word Definitions
Van-Tan Bui
Phuong-Thai Nguyen
[J]. 2021 RIVF INTERNATIONAL CONFERENCE ON COMPUTING AND COMMUNICATION TECHNOLOGIES (RIVF 2021), 2021, : 37 - 42
[22] Graph and Centroid-based Word Clustering
Thaiprayoon, Santipong
Unger, Herwig
Kubek, Mario
[J]. 2020 4TH INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND INFORMATION RETRIEVAL, NLPIR 2020, 2020, : 163 - 168
[23] Word Semantic Similarity Research Based on Latent Relationships
Lin, Xiaoqing
Wang, Danling
[J]. 2013 2ND INTERNATIONAL SYMPOSIUM ON INSTRUMENTATION AND MEASUREMENT, SENSOR NETWORK AND AUTOMATION (IMSNA), 2013, : 168 - 171
[24] An overview of word and sense similarity
Navigli, Roberto
Martelli, Federico
[J]. NATURAL LANGUAGE ENGINEERING, 2019, 25 (06) : 693 - 714
[25] A comparison of several statistical word clustering methods
Yuan L.
[J]. Yuan, Lichi (yuanlichi@sohu.com), 2016, Central South University of Technology (47): : 3079 - 3084
[26] A Modified Approach to Keyword Extraction Based on Word-similarity
Meng Wenchao
Liu Lianchen
Dai Ting
[J]. 2009 IEEE INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING AND INTELLIGENT SYSTEMS, PROCEEDINGS, VOL 3, 2009, : 388 - 392
[27] Construction of a Japanese Word Similarity Dataset
Sakaizawa, Yuya
Komachi, Mamoru
[J]. PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 948 - 951
[28] A Hypothesis on Word Similarity and Its Application
Jin, Peng
Qiu, Likun
Zhu, Xuefeng
Liu, Pengyuan
[J]. CHINESE LEXICAL SEMANTICS, 2014, 8922 : 317 - 325
[29] Extended Word Similarity Based Clustering on Unsupervised PoS Induction to Improve English-Indonesian Statistical Machine Translation
Sujaini, Herry
Arman, Arry Akhmad
Purwarianti, Ayu
Kuspriyanto
[J]. 2013 INTERNATIONAL CONFERENCE ORIENTAL COCOSDA HELD JOINTLY WITH 2013 CONFERENCE ON ASIAN SPOKEN LANGUAGE RESEARCH AND EVALUATION (O-COCOSDA/CASLRE), 2013,
[30] A Word Clustering-Based Crime Report Categorization Technique
Das, Priyanka
Das, Asit Kumar
[J]. COMPUTATIONAL INTELLIGENCE IN PATTERN RECOGNITION, CIPR 2020, 2020, 1120 : 451 - 463

← 1 2 3 4 5 →