Unsupervised phonetic and word level discovery for speech to speech translation for unwritten languages

被引:2
作者
Hillis, Steven [1 ]
Kumar, Anushree Prasanna [1 ]
Black, Alan W. [1 ]
机构
[1] Carnegie Mellon Univ, Language Technol Inst, Pittsburgh, PA 15213 USA
来源
INTERSPEECH 2019 | 2019年
关键词
speech-to-speech; machine translation; segmentation; unit discovery; low-resource; unwritten languages; Wilderness;
D O I
10.21437/Interspeech.2019-3026
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
We experiment with unsupervised methods for deriving and clustering symbolic representations of speech, working towards speech-to-speech translation for languages without regular (or any) written representations. We consider five low-resource African languages, and we produce three different segmental representations of text data for comparisons against four different segmental representations derived solely from acoustic data for each language. The text and speech data for each language comes from the CMU Wilderness dataset introduced in [1], where speakers read a version of the New Testament in their language. Our goal is to evaluate the translation performance not only of acoustically derived units but also of discovered sequences or "words" made from these units, with the intuition that such representations will encode more meaning than phones alone. We train statistical machine translation models for each representation and evaluate their outputs on the basis of BLEU-1 scores to determine their efficacy. Our experiments produce encouraging results: as we cluster our atomic phonetic representations into more word-like units, the amount information retained generally approaches that of the actual words themselves.
引用
收藏
页码:1138 / 1142
页数:5
相关论文
共 35 条
  • [1] [Anonymous], 2017, P ICNLSSP CAS MOR
  • [2] Baljekar P, 2015, 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, P3194
  • [3] Low-Resource Speech-to-Text Translation
    Bansal, Sameer
    Kamper, Herman
    Livescu, Karen
    Lopez, Adam
    Goldwater, Sharon
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 1298 - 1302
  • [4] Bansal Sameer, 2018, ARXIV180901431
  • [5] Bérard A, 2018, 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), P6224, DOI 10.1109/ICASSP.2018.8461690
  • [6] Berard Alexandre, 2016, NIPS WORKSH END TO E
  • [7] Towards speech translation of non written languages
    Besacier, Laurent
    Zhou, Bowen
    Gao, Yuqing
    [J]. 2006 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, 2006, : 222 - +
  • [8] Black A. W., 2019, ICASSP
  • [9] Black A. W., 2006, 9 INT C SPOK LANG PR
  • [10] Gage P., 1994, The C Users Journal, V12, P2338, DOI DOI 10.5555/177910.177914