KAT5: Knowledge-Aware Transfer Learning with a Text-to-Text Transfer Transformer

被引:0
作者
Sohrab, Mohammad Golam [1 ]
Miwa, Makoto [1 ,2 ]
机构
[1] Natl Inst Adv Ind Sci & Technol, Artificial Intelligence Res Ctr, Tokyo, Japan
[2] Toyota Technol Inst, Nagoya, Aichi, Japan
来源
MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES-APPLIED DATA SCIENCE TRACK, PT IX, ECML PKDD 2024 | 2024年 / 14949卷
关键词
Natural language processing; Transfer learning; Language model; Sequence-to-Sequence; Language understanding and generation; Information extraction; Machine translation;
D O I
10.1007/978-3-031-70378-2_10
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We introduce knowledge-aware transfer learning with a text-to-text transfer transformer (KAT5) by leveraging a text-to-text transfer transformer (T5) in the Wikipedia domain. In standard transfer learning like T5, a model is first pre-trained on an unsupervised data task with a language model objective before fine-tuning it on a downstream task. T5 explores several learning objectives, including masked language model (MLM), random span, and deshuffling, where the model is limited to exploring integrating knowledge during pre-training. Here, we push the limits of this model by grafting knowledge like entity and co-reference information by mapping Wikipedia and Wikidata during pre-training. We align large-scale alignments between Wikipedia abstract and Wikidata triples to facilitate our pre-training KAT5 model. Our approach can match or outperform task-specific models while using the same architecture and hyper-parameters, in particular in entity and relation extraction (CoNLL04, ADE, and NYT datasets), and language generation tasks, including abstractive summarization (XSum, CNNDM), and machine translation. Our code is publicly released on GitHub (https://github.com/aistairc/kat5) under the Apache 2.0 License.
引用
收藏
页码:157 / 173
页数:17
相关论文
共 39 条
  • [1] Ba L. J., 2016, arXiv
  • [2] Brümmer M, 2016, LREC 2016 - TENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, P3339
  • [3] Cabot PLH, 2021, FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021, P2370
  • [4] Chang AX, 2012, LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, P3735
  • [5] Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
  • [6] Eberts M., 2019, arXiv, DOI DOI 10.3233/FAIA200321
  • [7] Elsahar H, 2018, PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), P3448
  • [8] Ghazvininejad M, 2019, 2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019), P6112
  • [9] Gu J., 2018, INT C LEARN REPR, DOI [10.48550/arXiv.1711.02281, DOI 10.48550/ARXIV.1711.02281]
  • [10] Gu JT, 2019, ADV NEUR IN, V32