Generating Synthetic Data for Neural Keyword-to-Question Models

被引:0
|
作者
Ding, Heng [1 ]
Balog, Krisztian [2 ]
机构
[1] Wuhan Univ, Wuhan, Peoples R China
[2] Univ Stavanger, Stavanger, Norway
来源
PROCEEDINGS OF THE 2018 ACM SIGIR INTERNATIONAL CONFERENCE ON THEORY OF INFORMATION RETRIEVAL (ICTIR'18) | 2018年
关键词
Keyword-to-question; synthetic data generation; neural machine translation;
D O I
10.1145/3234944.3234964
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Search typically relies on keyword queries, but these are often semantically ambiguous. We propose to overcome this by offering users natural language questions, based on their keyword queries, to disambiguate their intent. This keyword-to-question task may be addressed using neural machine translation techniques. Neural translation models, however, require massive amounts of training data (keyword-question pairs), which is unavailable for this task. The main idea of this paper is to generate large amounts of synthetic training data from a small seed set of hand-labeled keyword-question pairs. Since natural language questions are available in large quantities, we develop models to automatically generate the corresponding keyword queries. Further, we introduce various filtering mechanisms to ensure that synthetic training data is of high quality. We demonstrate the feasibility of our approach using both automatic and manual evaluation.
引用
收藏
页码:51 / 58
页数:8
相关论文
共 50 条
  • [1] Training Question Answering Models From Synthetic Data
    Puri, Raul
    Spring, Ryan
    Shoeybi, Mohammad
    Patwary, Mostofa
    Catanzaro, Bryan
    PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 5811 - 5826
  • [2] A Review of Generative Models in Generating Synthetic Attack Data for Cybersecurity
    Agrawal, Garima
    Kaur, Amardeep
    Myneni, Sowmya
    ELECTRONICS, 2024, 13 (02)
  • [3] Recurrent neural supervised models for generating solar radiation synthetic series
    Hontoria, L
    Aguilera, J
    Riesco, J
    Zufiria, P
    JOURNAL OF INTELLIGENT & ROBOTIC SYSTEMS, 2001, 31 (1-3) : 201 - 221
  • [4] Recurrent Neural Supervised Models for Generating Solar Radiation Synthetic Series
    L. Hontoria
    J. Aguilera
    J. Riesco
    P. Zufiria
    Journal of Intelligent and Robotic Systems, 2001, 31 : 201 - 221
  • [5] Generating synthetic data
    Ayilara, Olawale F.
    Platt, Robert W.
    Dahl, Matt
    Coulombe, Janie
    Ginestet, Pablo Gonzalez
    Chateau, Dan
    Lix, Lisa M.
    INTERNATIONAL JOURNAL OF POPULATION DATA SCIENCE (IJPDS), 2023, 8 (01):
  • [6] MODELS FOR GENERATING SYNTHETIC SEISMOGRAMS
    LYNCH, RD
    MURPHY, JR
    HAYS, WW
    TRANSACTIONS-AMERICAN GEOPHYSICAL UNION, 1970, 51 (11): : 778 - &
  • [7] Generating and evaluating synthetic data in digital pathology through diffusion models
    Pozzi, Matteo
    Noei, Shahryar
    Robbi, Erich
    Cima, Luca
    Moroni, Monica
    Munari, Enrico
    Torresani, Evelin
    Jurman, Giuseppe
    SCIENTIFIC REPORTS, 2024, 14 (01):
  • [8] Generating Synthetic Tabular Data for DDoS Detection Using Generative Models
    Saka, Samed
    Al-Ataby, Ali
    Selis, Valerio
    2023 IEEE 22ND INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS, TRUSTCOM, BIGDATASE, CSE, EUC, ISCI 2023, 2024, : 1436 - 1442
  • [9] From Zero to Hero: Generating Training Data for Question-To-Cypher Models
    Opitz, Dominik
    Hochgeschwender, Nico
    2022 IEEE/ACM 1ST INTERNATIONAL WORKSHOP ON NATURAL LANGUAGE-BASED SOFTWARE ENGINEERING (NLBSE 2022), 2022, : 17 - 20
  • [10] A Tool for Generating Synthetic Data
    Peng, Taoxin
    Telle, Alexander
    PROCEEDINGS OF THE FIRST INTERNATIONAL CONFERENCE ON DATA SCIENCE, E-LEARNING AND INFORMATION SYSTEMS 2018 (DATA'18), 2018,