Learning Bilingual Lexicon for Low-Resource Language Pairs

被引:0
作者
Zhu, ShaoLin [1 ,2 ,3 ]
Li, Xiao [1 ,2 ]
Yang, YaTing [1 ,2 ]
Wang, Lei [1 ,2 ]
Mi, ChengGang [1 ,2 ]
机构
[1] Chinese Acad Sci, Xinjiang Tech Inst Phys & Chem, Urumqi, Peoples R China
[2] Key Lab Speech Language Informat Proc Xinjiang, Urumqi, Peoples R China
[3] Univ Chinese Acad Sci, Beijing, Peoples R China
来源
NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, NLPCC 2017 | 2018年 / 10619卷
基金
中国科学院西部之光基金;
关键词
D O I
10.1007/978-3-319-73618-1_66
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Learning bilingual lexicon from monolingual data is a novel idea in natural language process which can benefit many low-resource language pairs. In this paper, we present an approach for obtaining bilingual lexicon from monolingual data. Our method only requires a small seed bilingual lexicon and we use the Canonical Correlation Analysis to construct a shared latent space to explain two monolingual embeddings how to be linked. Experimental results show that a considerable precision and size bilingual lexicon can be learned in Chinese-Uyghur and Chinese-Kazakh monolingual data.
引用
收藏
页码:760 / 770
页数:11
相关论文
共 50 条
  • [41] Parameter-Efficient Language Model Tuning with Active Learning in Low-Resource Settings
    Jukic, Josip
    Snajder, Jan
    [J]. 2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 5061 - 5074
  • [42] Sentiment Analysis of Low-Resource Language Literature Using Data Processing and Deep Learning
    Ali, Aizaz
    Khan, Maqbool
    Khan, Khalil
    Khan, Rehan Ullah
    Aloraini, Abdulrahman
    [J]. CMC-COMPUTERS MATERIALS & CONTINUA, 2024, 79 (01): : 713 - 733
  • [43] Hybrid Approach Text Generation for Low-Resource Language
    Rakhimova, Diana
    Adali, Esref
    Karibayeva, Aidana
    [J]. ADVANCES IN COMPUTATIONAL COLLECTIVE INTELLIGENCE, ICCCI 2024, PART I, 2024, 2165 : 256 - 268
  • [44] Multilingual Offensive Language Identification for Low-resource Languages
    Ranasinghe, Tharindu
    Zampieri, Marcos
    [J]. ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2022, 21 (01)
  • [45] A Scheme for News Article Classification in a Low-Resource Language
    Yohannes, Hailemariam Mehari
    Amagasa, Toshiyuki
    [J]. INFORMATION INTEGRATION AND WEB INTELLIGENCE, IIWAS 2022, 2022, 13635 : 519 - 530
  • [46] Low-resource Taxonomy Enrichment with Pretrained Language Models
    Takeoka, Kunihiro
    Akimoto, Kosuke
    Oyamada, Masafumi
    [J]. 2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 2747 - 2758
  • [47] NLPashto: NLP Toolkit for Low-resource Pashto Language
    Haq, Ijazul
    Qiu, Weidong
    Guo, Jie
    Tang, Peng
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2023, 14 (06) : 1344 - 1352
  • [48] Building a Dataset for Misinformation Detection in the Low-Resource Language
    Mukwevho, Mulweli
    Rananga, Seani
    Mbooi, Mahlatse S.
    Isong, Bassey
    Marivate, Vukosi
    [J]. 2024 IST-AFRICA CONFERENCE, 2024,
  • [49] Automatic Labeling of Clusters for a Low-Resource Urdu Language
    Nasim, Zarmeen
    Haider, Sajjad
    [J]. ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2022, 21 (05)
  • [50] Meta Learning for Low-Resource Molecular Optimization
    Wang, Jiahao
    Zheng, Shuangjia
    Chen, Jianwen
    Yang, Yuedong
    [J]. JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2021, 61 (04) : 1627 - 1636