Unsupervised Dialogue Topic Segmentation in Hyperdimensional Space

被引:0
作者
Park, Seongmin [1 ]
Seo, Jinkyu [2 ]
Lee, Jihwa [1 ]
机构
[1] ActionPower, Seoul, South Korea
[2] Seoul Natl Univ, Dept Appl Biol & Chem, Seoul, South Korea
来源
INTERSPEECH 2023 | 2023年
关键词
topic segmentation; hyperdimensional computing; summarization;
D O I
10.21437/Interspeech.2023-1859
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We present HyperSeg, a hyperdimensional computing (HDC) approach to unsupervised dialogue topic segmentation. HDC is a class of vector symbolic architectures that leverages the probabilistic orthogonality of randomly drawn vectors at extremely high dimensions (typically over 10, 000). HDC generates rich token representations through its low-cost initialization of many unrelated vectors. This is especially beneficial in topic segmentation, which often operates as a resource-constrained preprocessing step for downstream transcript understanding tasks. HyperSeg outperforms the current state-of-the-art in 4 out of 5 segmentation benchmarks - even when baselines are given partial access to the ground truth - and is 10 times faster on average. We show that HyperSeg also improves downstream summarization accuracy. With HyperSeg, we demonstrate the viability of HDC in a major language task. We open-source HyperSeg to provide a strong baseline for unsupervised topic segmentation.(1)
引用
收藏
页码:730 / 734
页数:5
相关论文
共 28 条
[1]  
[Anonymous], 2016, CoNLL 2016, DOI [DOI 10.18653/V1/K16-1028, 10.18653/v1/K16-1028.URLhttps]
[2]  
Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
[3]  
Feng S, 2020, AAAI CONF ARTIF INTE, V34, P13604
[4]  
Fournier C., 2013, Long Papers, V1, P1702
[5]  
Gao TY, 2021, 2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), P6894
[6]  
Glavas G, 2016, P 5 JOINT C LEX COMP, P125, DOI 10.18653/v1/S16-2016
[7]   Blessing of dimensionality: mathematical foundations of the statistical physics of data [J].
Gorban, A. N. ;
Tyukin, I. Y. .
PHILOSOPHICAL TRANSACTIONS OF THE ROYAL SOCIETY A-MATHEMATICAL PHYSICAL AND ENGINEERING SCIENCES, 2018, 376 (2118)
[8]  
Greff K, 2020, Arxiv, DOI arXiv:2012.05208
[9]  
Hearst MA, 1997, COMPUT LINGUIST, V23, P33
[10]  
Janin A, 2003, 2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS, P364