Dealing with unknowns in machine translation

被引:0
|
作者
Sinha, RMK [1 ]
机构
[1] Indian Inst Technol, Kanpur 208016, Uttar Pradesh, India
来源
2001 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS, VOLS 1-5: E-SYSTEMS AND E-MAN FOR CYBERNETICS IN CYBERSPACE | 2002年
关键词
machine translation; natural language processing; unknown words; English to Hindi;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
An 'unknown' is defined as a word for which there is no entry in the dictionary used by the translation system. In general, a text may contain several unknowns. These words may be names, acronyms, abbreviations, terminology or foreign words. It is a common practice in India to mix the words of English in Hindi and other Indian languages and vice-versa. However, the grammatical rules in construction of gender, number, verb-normalization or forms, conform to that for the language used irrespective of their origin. This gives rise to a frequent encounter of unknown words in day-to-day communication. A machine translation system has to provide mechanism for handling such unknowns. Spelling mistakes is yet another source that contributes to the unknowns. In this paper we describe a strategy being adopted in our system for machine aided translation from English to Hindi. No attempt has been made to expand the vocabulary by deriving their meaning. Instead, once an unknown is identified, a transliteration in Hindi with appropriate suffixes or appendage is used to substitute for their meaning. We use predictive parsing and a number of heuristics to identify the type of unknown.
引用
收藏
页码:940 / 944
页数:5
相关论文
共 50 条
  • [41] Statistical machine translation based on weighted syntax–semantics
    Debajyoty Banik
    Asif Ekbal
    Pushpak Bhattacharyya
    Sādhanā, 2020, 45
  • [42] Towards Machine Parsing and Translation of the Complex Syntactic Structures
    Du, Jia-li
    Yu, Ping-fang
    2015 INTERNATIONAL CONFERENCE ON SOFTWARE, MULTIMEDIA AND COMMUNICATION ENGINEERING (SMCE 2015), 2015, : 33 - 39
  • [43] Evaluating Machine Translation Quality with Conformal Predictive Distributions
    Giovannotti, Patrizio
    CONFORMAL AND PROBABILISTIC PREDICTION WITH APPLICATIONS, VOL 204, 2023, 204 : 413 - 429
  • [44] Augmented Spanish-Persian Neural Machine Translation
    Ahmadnia, Benyamin
    Aranovich, Raul
    ICAART: PROCEEDINGS OF THE 13TH INTERNATIONAL CONFERENCE ON AGENTS AND ARTIFICIAL INTELLIGENCE - VOL 1, 2021, : 482 - 488
  • [45] Fast Streaming Translation Using Machine Learning with Transformer
    Qiu, Jiabao
    Moh, Melody
    Moh, Teng-Sheng
    ACMSE 2021: PROCEEDINGS OF THE 2021 ACM SOUTHEAST CONFERENCE, 2021, : 9 - 16
  • [46] Parallel Implementation of Machine Translation using MPJ Express
    Tomar, Anuradha
    Bodhankar, Jahnavi
    Kurariya, Pavan
    Anarase, Pramod
    Jain, Priyanka
    Lele, Anuradha
    Darbari, Hemant
    Bhavsar, Virendrakumar C.
    2013 NATIONAL CONFERENCE ON PARALLEL COMPUTING TECHNOLOGIES (PARCOMPTECH), 2013,
  • [47] Cross-Lingual Preposition Disambiguation for Machine Translation
    Kumar, M. Anand
    Rajendran, S.
    Soman, K. P.
    ELEVENTH INTERNATIONAL CONFERENCE ON COMMUNICATION NETWORKS, ICCN 2015/INDIA ELEVENTH INTERNATIONAL CONFERENCE ON DATA MINING AND WAREHOUSING, ICDMW 2015/NDIA ELEVENTH INTERNATIONAL CONFERENCE ON IMAGE AND SIGNAL PROCESSING, ICISP 2015, 2015, 54 : 291 - 300
  • [48] OVERVIEW OF NATURAL LANGUAGE PROCESSING AND MACHINE TRANSLATION METHODS
    Suman, Sabrina
    ZBORNIK VELEUCILISTA U RIJECI-JOURNAL OF THE POLYTECHNICS OF RIJEKA, 2021, 9 (01): : 371 - 384
  • [49] Machine translation training data for English-Tshivenda
    Gaustad, Tanja
    McKellar, Cindy A.
    Puttkammer, Martin J.
    DATA IN BRIEF, 2024, 57
  • [50] A reordering model for phrase-based machine translation
    Nguyen, Vinh Van
    Nguyen, Thai Phuong
    Shimazu, Akira
    Nguyen, Minh Le
    ADVANCES IN NATURAL LANGUAGE PROCESSING, PROCEEDINGS, 2008, 5221 : 476 - +