Dealing with unknowns in machine translation

被引:0
|
作者
Sinha, RMK [1 ]
机构
[1] Indian Inst Technol, Kanpur 208016, Uttar Pradesh, India
来源
2001 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS, VOLS 1-5: E-SYSTEMS AND E-MAN FOR CYBERNETICS IN CYBERSPACE | 2002年
关键词
machine translation; natural language processing; unknown words; English to Hindi;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
An 'unknown' is defined as a word for which there is no entry in the dictionary used by the translation system. In general, a text may contain several unknowns. These words may be names, acronyms, abbreviations, terminology or foreign words. It is a common practice in India to mix the words of English in Hindi and other Indian languages and vice-versa. However, the grammatical rules in construction of gender, number, verb-normalization or forms, conform to that for the language used irrespective of their origin. This gives rise to a frequent encounter of unknown words in day-to-day communication. A machine translation system has to provide mechanism for handling such unknowns. Spelling mistakes is yet another source that contributes to the unknowns. In this paper we describe a strategy being adopted in our system for machine aided translation from English to Hindi. No attempt has been made to expand the vocabulary by deriving their meaning. Instead, once an unknown is identified, a transliteration in Hindi with appropriate suffixes or appendage is used to substitute for their meaning. We use predictive parsing and a number of heuristics to identify the type of unknown.
引用
收藏
页码:940 / 944
页数:5
相关论文
共 50 条
  • [1] Dealing with mixing of English verbs in Hindi for machine translation
    Sinha, RMK
    ICAI '05: PROCEEDINGS OF THE 2005 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOLS 1 AND 2, 2005, : 773 - 778
  • [2] Statistical machine translation
    Lopez, Adam
    ACM COMPUTING SURVEYS, 2008, 40 (03)
  • [3] Machine Translation of Mathematical Text
    Ohri, Aditya
    Schmah, Tanya
    IEEE ACCESS, 2021, 9 : 38078 - 38086
  • [4] Interpreting unknown words in machine translation from Hindi to English
    Sinha, RMK
    Proceedings of the IASTED International Conference on Computational Intelligence, 2005, : 278 - 282
  • [5] Neural Machine Translation as a Novel Approach to Machine Translation
    Benkova, Lucia
    Benko, Lubomir
    DIVAI 2020: 13TH INTERNATIONAL SCIENTIFIC CONFERENCE ON DISTANCE LEARNING IN APPLIED INFORMATICS, 2020, : 499 - 508
  • [6] A Review on Machine Translation in Indian Languages
    Chopra, Deepti
    Joshi, Nisheeth
    Mathur, Iti
    ENGINEERING TECHNOLOGY & APPLIED SCIENCE RESEARCH, 2018, 8 (05) : 3475 - 3478
  • [7] Machine Translation
    张严心
    海外英语, 2015, (04) : 255 - 256
  • [8] Hindi to Punjabi Machine Translation System
    Goyal, Vishal
    Lehal, Gurpreet Singh
    INFORMATION SYSTEMS FOR INDIAN LANGUAGES, 2011, 139 : 236 - 241
  • [9] Neural machine translation for Tamil to English
    Jain, Minni
    Punia, Ravneet
    Hooda, Ishika
    JOURNAL OF STATISTICS AND MANAGEMENT SYSTEMS, 2020, 23 (07) : 1251 - 1264
  • [10] Machine Translation Shortcomings and Teaching Translation
    Mirzoyeva, Leila
    REVISTA ROMANEASCA PENTRU EDUCATIE MULTIDIMENSIONALA, 2023, 15 (03): : 232 - 242