Data generation using sequence-to-sequence

被引:0
|
作者
Joshi, Akshat [1 ]
Mehta, Kinal [1 ]
Gupta, Neha [1 ]
Valloli, Varun Kannadi [1 ]
机构
[1] C DAC, GIST Grp, Pune, Maharashtra, India
来源
2018 IEEE RECENT ADVANCES IN INTELLIGENT COMPUTATIONAL SYSTEMS (RAICS) | 2018年
关键词
Sequence2Sequence; NLP; transliteration; LSTM; encoder; decoder;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Sequence to Sequence models have shown a lot of promise for dealing with problems such as Neural Machine Translation (NMT), Text Summarization, Paraphrase Generation etc. Deep Neural Networks (DNNs) work well with large and labeled training sets but in sequence-to-sequence problems, mapping becomes a much harder task due to the differences in syntax, semantics and length. Moreover usage of DNNs is constrained by the fixed dimensionality of the input and output, which is not the case with most of the Natural Language Processing (NLP) problems. Our primary focus was to build transliteration systems for Indian languages. In the case of Indian languages, monolingual corpora are abundantly available but a parallel one which can be directly applied to transliteration problem is scarce. With the available parallel corpus, we could only build weak models. We propose a system to leverage the mono-lingual corpus to generate a clean and quality parallel corpus for transliteration, which is then iteratively used to tune the existing weak transliteration models. The results that we got prove our hypothesis that the process of generation of clean data can be validated objectively by evaluating the models alongside the efficiency of the system to generate data in each iteration.
引用
收藏
页码:108 / 112
页数:5
相关论文
共 50 条
  • [1] Automatic Target Generation for Electronic Data Interchange using Sequence-to-Sequence Models
    Baysan, Mehmet Selman
    Kizilay, Furkan
    Gundogan, Haluk Harun
    Ozmen, Ayse Irem
    Ince, Gokhan
    INTELLIGENT AND FUZZY SYSTEMS, INFUS 2024 CONFERENCE, VOL 1, 2024, 1088 : 158 - 166
  • [2] Turkish Data-to-Text Generation Using Sequence-to-Sequence Neural Networks
    Demir, Seniz
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2023, 22 (02)
  • [3] Persian Keyphrase Generation Using Sequence-to-sequence Models
    Doostmohammadi, Ehsan
    Bokaei, Mohammad Hadi
    Sameti, Hossein
    2019 27TH IRANIAN CONFERENCE ON ELECTRICAL ENGINEERING (ICEE 2019), 2019, : 2010 - 2015
  • [4] SEQUENCE-TO-SEQUENCE LABANOTATION GENERATION BASED ON MOTION CAPTURE DATA
    Li, Min
    Miao, Zhenjiang
    Ma, Cong
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 4517 - 4521
  • [5] Synthesizing waveform sequence-to-sequence to augment training data for sequence-to-sequence speech recognition
    Ueno, Sei
    Mimura, Masato
    Sakai, Shinsuke
    Kawahara, Tatsuya
    ACOUSTICAL SCIENCE AND TECHNOLOGY, 2021, 42 (06) : 333 - 343
  • [6] Data2Vis: Automatic Generation of Data Visualizations Using Sequence-to-Sequence Recurrent Neural Networks
    Dibia, Victor
    Demiralp, Cagatay
    IEEE COMPUTER GRAPHICS AND APPLICATIONS, 2019, 39 (05) : 33 - 46
  • [7] Question Generation Using Sequence-to-Sequence Model with Semantic Role Labels
    Naeiji, Alireza
    An, Aijun
    Davoudi, Heidar
    Delpisheh, Marjan
    Alzghool, Muath
    17TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EACL 2023, 2023, : 2830 - 2842
  • [8] Sequence-to-sequence alignment using a pendulum
    Pribanic, Tomislav
    Lelas, Marko
    Krois, Igor
    IET COMPUTER VISION, 2015, 9 (04) : 570 - 575
  • [9] Neural AMR: Sequence-to-Sequence Models for Parsing and Generation
    Konstas, Ioannis
    Iyer, Srinivasan
    Yatskar, Mark
    Choi, Yejin
    Zettlemoyer, Luke
    PROCEEDINGS OF THE 55TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2017), VOL 1, 2017, : 146 - 157
  • [10] Sequence-to-Sequence Imputation of Missing Sensor Data
    Dabrowski, Joel Janek
    Rahman, Ashfaqur
    AI 2019: ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, 11919 : 265 - 276