Dealing with unknowns in machine translation

被引:0
|
作者
Sinha, RMK [1 ]
机构
[1] Indian Inst Technol, Kanpur 208016, Uttar Pradesh, India
来源
2001 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS, VOLS 1-5: E-SYSTEMS AND E-MAN FOR CYBERNETICS IN CYBERSPACE | 2002年
关键词
machine translation; natural language processing; unknown words; English to Hindi;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
An 'unknown' is defined as a word for which there is no entry in the dictionary used by the translation system. In general, a text may contain several unknowns. These words may be names, acronyms, abbreviations, terminology or foreign words. It is a common practice in India to mix the words of English in Hindi and other Indian languages and vice-versa. However, the grammatical rules in construction of gender, number, verb-normalization or forms, conform to that for the language used irrespective of their origin. This gives rise to a frequent encounter of unknown words in day-to-day communication. A machine translation system has to provide mechanism for handling such unknowns. Spelling mistakes is yet another source that contributes to the unknowns. In this paper we describe a strategy being adopted in our system for machine aided translation from English to Hindi. No attempt has been made to expand the vocabulary by deriving their meaning. Instead, once an unknown is identified, a transliteration in Hindi with appropriate suffixes or appendage is used to substitute for their meaning. We use predictive parsing and a number of heuristics to identify the type of unknown.
引用
收藏
页码:940 / 944
页数:5
相关论文
共 50 条
  • [21] On Application of Natural Language Processing in Machine Translation
    Zong, Zhaorong
    Hong, Changchun
    2018 3RD INTERNATIONAL CONFERENCE ON MECHANICAL, CONTROL AND COMPUTER ENGINEERING (ICMCCE), 2018, : 506 - 510
  • [22] Hybrid Machine Translation For English to Marathi: A Research Evaluation In Machine Translation
    Salunkhe, Pramod
    Kadam, Aniket D.
    Joshi, Shashank
    Patil, Shuhas
    Thakore, Devendrasingh
    Jadhav, Shrikant
    2016 INTERNATIONAL CONFERENCE ON ELECTRICAL, ELECTRONICS, AND OPTIMIZATION TECHNIQUES (ICEEOT), 2016, : 924 - 931
  • [23] Towards Machine Translation of Chinese Complex Structures
    Yu, Pingfang
    Du, Jiali
    Li, Xinguang
    ADVANCES IN HUMAN FACTORS AND SYSTEMS INTERACTION, 2020, 959 : 582 - 593
  • [24] English-Maithili Machine Translation and Divergence
    Nidhi, Ritu
    Singh, Tanya
    2018 7TH INTERNATIONAL CONFERENCE ON RELIABILITY, INFOCOM TECHNOLOGIES AND OPTIMIZATION (TRENDS AND FUTURE DIRECTIONS) (ICRITO) (ICRITO), 2018, : 775 - 778
  • [25] Polyglot machine translation
    Leiva, Luis A.
    Alabau, Vicent
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2017, 32 (01) : 613 - 627
  • [26] Statistical Machine Translation
    Vatsa, Mukesh G. S.
    Joshi, Nikita
    Goswami, Sumit
    DESIDOC JOURNAL OF LIBRARY & INFORMATION TECHNOLOGY, 2010, 30 (04): : 25 - 32
  • [27] Prospects in Machine Translation
    Xu, Jin'an
    PROCEEDINGS OF 2010 CROSS-STRAIT CONFERENCE ON INFORMATION SCIENCE AND TECHNOLOGY, 2010, : 368 - 372
  • [28] A SOMAgent for machine translation
    Lopez, Vivian F.
    Alonso, Luis
    Moreno, Maria N.
    EXPERT SYSTEMS WITH APPLICATIONS, 2010, 37 (12) : 7993 - 7996
  • [29] Twi Machine Translation
    Gyasi, Frederick
    Schlippe, Tim
    BIG DATA AND COGNITIVE COMPUTING, 2023, 7 (02)
  • [30] Progress in Machine Translation
    Wang, Haifeng
    Wu, Hua
    He, Zhongjun
    Huang, Liang
    Church, Kenneth Ward
    ENGINEERING, 2022, 18 : 143 - 153