BiCalBERT: An Efficient Transformer-based Model for Chinese Question Answering

被引：1

作者：

Han, Yanbo ^{[1
]}

Zhan, Buchao ^{[1
]}

Zhang, Bin ^{[2
,3
]}

Zhao, Chao ^{[3
]}

Yan, Shankai ^{[1
]}

机构：

[1] Hainan Univ, Sch Comp Sci & Technol, Haikou, Hainan, Peoples R China

[2] Xinjiang Univ Polit Sci, Sch Informat Network Secur, Hong Kong, Peoples R China

[3] City Univ Hong Kong, Dept Comp Sci, Hong Kong, Peoples R China

来源：

2024 8TH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS, METAHEURISTICS & SWARM INTELLIGENCE, ISMSI 2024 | 2024年

基金：

中国国家自然科学基金;

关键词：

Question Answering; Transformer; NLP;

D O I：

10.1145/3665065.3665081

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The exponentially growing content on the Internet includes online publications, scientific news, and other expert websites. It brings a formidable challenge to extracting pertinent answers from such vast information. We proposed a deep neural network model based on BiLSTM(Bi-directional Long Short-Term Memory) and ALBERT(A Lite BERT for Self-Supervised Learning of Language Representations) for Chinese question-answering in the scientific context. Our model significantly enhances the generalization capabilities of the transformer-based model for question answering. The character extraction and word embedding modules are designed for tackling intricate science-related queries, swiftly assimilating knowledge from cutting-edge scientific literature, and contributing to constructing a comprehensive scientific knowledge graph. Our model is characterized by compactness, swift execution, and satisfactory accuracy. It has been meticulously fine-tuned, emphasizing multisentence coherence augmented by attention mechanisms, thereby ensuring robust scalability. Empirical evaluations on the LCQMC and XNLI datasets demonstrate that our approach surpasses the performance results of BERT and ALBERT, showing the potential for large-scale Chinese question-matching problems.

引用

页码：100 / 104

页数：5

共 15 条

[1] A neural probabilistic language model [J].

Bengio, Y ;

Ducharme, R ;

Vincent, P ;

Jauvin, C .

JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (06) :1137-1155

[2]

Bengio Y, 2001, ADV NEUR IN, V13, P932

[3]

Brown TB, 2020, ADV NEUR IN, V33

[4]

He P., 2021, arXiv, DOI [DOI 10.48550/ARXIV.2006.03654, 10.48550/arXiv.2006.03654]

[5]

Hirschman L., 2001, Natural Language Engineering, V7, P275, DOI 10.1017/S1351324901002807

[6]

Kong SH, 2020, IEEE 10TH SYMPOSIUM ON COMPUTER APPLICATIONS AND INDUSTRIAL ELECTRONICS (ISCAIE 2020), P102, DOI [10.1109/iscaie47305.2020.9108841, 10.1109/ISCAIE47305.2020.9108841]

[7]

Lan ZZ, 2020, Arxiv, DOI [arXiv:1909.11942, DOI 10.48550/ARXIV.1909.11942]

[8]

Liu YH, 2019, Arxiv, DOI [arXiv:1907.11692, DOI 10.48550/ARXIV.1907.11692]

[9]

Mikolov T, 2013, Arxiv, DOI arXiv:1301.3781

[10]

MOHAMAD Khadija, 2023, Bitlis Eren Universitesi Fen Bilimleri Dergisi, V12, P712, DOI DOI 10.17798/BITLISFEN.1288561

← 1 2 →