PhoBERT: Pre-trained language models for Vietnamese

被引:0
|
作者
Dat Quoc Nguyen [1 ]
Anh Tuan Nguyen [2 ]
机构
[1] VinAI Res, Hanoi, Vietnam
[2] NVIDIA, Santa Clara, CA USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present PhoBERT with two versionsPhoBERTbase and PhoBERTlarge-the first public large-scale monolingual language models pre-trained for Vietnamese. Experimental results show that PhoBERT consistently outperforms the recent best pre-trained multilingual model XLM-R (Conneau et al., 2020) and improves the state-of-the-art in multiple Vietnamese-specific NLP tasks including Part-of-speech tagging, Dependency parsing, Named-entity recognition and Natural language inference. We release PhoBERT to facilitate future research and downstream applications for Vietnamese NLP. Our PhoBERT models are available at: https://github. com/VinAIResearch/PhoBERT.
引用
收藏
页码:1037 / 1042
页数:6
相关论文
共 50 条
  • [1] Federated Learning for Vietnamese SMS Spam Detection Using Pre-trained PhoBERT
    Hoang Quang Anh
    Pham Tuan Anh
    Pham Son Nguyen
    Phan Duy Hung
    INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING - IDEAL 2024, PT I, 2025, 15346 : 254 - 264
  • [2] ViHealthBERT: Pre-trained Language Models for Vietnamese in Health Text Mining
    Minh Phuc Nguyen
    Vu Hoang Tran
    Vu Hoang
    Ta Duc Huy
    Bui, Trung H.
    Truong, Steven Q. H.
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 328 - 337
  • [3] A Study of Vietnamese Sentiment Classification with Ensemble Pre-trained Language Models
    Thin, Dang Van
    Hao, Duong Ngoc
    Nguyen, Ngan Luu-Thuy
    VIETNAM JOURNAL OF COMPUTER SCIENCE, 2024, 11 (01) : 137 - 165
  • [4] Error Investigation of Pre-trained BERTology Models on Vietnamese Natural Language Inference
    Tin Van Huynh
    Huy Quoc To
    Kiet Van Nguyen
    Ngan Luu-Thuy Nguyen
    RECENT CHALLENGES IN INTELLIGENT INFORMATION AND DATABASE SYSTEMS, ACIIDS 2022, 2022, 1716 : 176 - 188
  • [5] ViDeBERTa: A powerful pre-trained language model for Vietnamese
    Tran, Cong Dao
    Pham, Nhut Huy
    Nguyen, Anh
    Hy, Truong Son
    Vu, Tu
    17TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EACL 2023, 2023, : 1071 - 1078
  • [6] Pre-Trained Language Models and Their Applications
    Wang, Haifeng
    Li, Jiwei
    Wu, Hua
    Hovy, Eduard
    Sun, Yu
    ENGINEERING, 2023, 25 : 51 - 65
  • [7] Annotating Columns with Pre-trained Language Models
    Suhara, Yoshihiko
    Li, Jinfeng
    Li, Yuliang
    Zhang, Dan
    Demiralp, Cagatay
    Chen, Chen
    Tan, Wang-Chiew
    PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA (SIGMOD '22), 2022, : 1493 - 1503
  • [8] LaoPLM: Pre-trained Language Models for Lao
    Lin, Nankai
    Fu, Yingwen
    Yang, Ziyu
    Chen, Chuwei
    Jiang, Shengyi
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 6506 - 6512
  • [9] Deciphering Stereotypes in Pre-Trained Language Models
    Ma, Weicheng
    Scheible, Henry
    Wang, Brian
    Veeramachaneni, Goutham
    Chowdhary, Pratim
    Sung, Alan
    Koulogeorge, Andrew
    Wang, Lili
    Yang, Diyi
    Vosoughi, Soroush
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2023), 2023, : 11328 - 11345
  • [10] Knowledge Rumination for Pre-trained Language Models
    Yao, Yunzhi
    Wang, Peng
    Mao, Shengyu
    Tan, Chuanqi
    Huang, Fei
    Chen, Huajun
    Zhang, Ningyu
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 3387 - 3404