DisenQNet: Disentangled Representation Learning for Educational Questions

被引：30

作者：

Huang, Zhenya ^{[1
]}

Lin, Xin ^{[1
]}

Wang, Hao ^{[1
]}

Liu, Qi ^{[1
]}

Chen, Enhong ^{[1
]}

Ma, Jianhui ^{[1
]}

Su, Yu ^{[1
,2
]}

Tong, Wei ^{[1
]}

机构：

[1] Univ Sci & Technol China, Sch Comp Sci & Technol, Anhui Prov Key Lab Big Data Anal & Applicat, Hefei, Anhui, Peoples R China

[2] iFLYTEK Co Ltd, iFLYTEK Res, Hefei, Anhui, Peoples R China

来源：

KDD '21: PROCEEDINGS OF THE 27TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING | 2021年

基金：

中国国家自然科学基金;

关键词：

question learning; disentangled representation; mutual information;

D O I：

10.1145/3447548.3467347

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Learning informative representations for educational questions is a fundamental problem in online learning systems, which can promote many applications, e.g., difficulty estimation. Most solutions integrate all information of one question together following a supervised manner, where the representation results are unsatisfactory sometimes due to the following issues. First, they cannot ensure the presentation ability due to the scarcity of labeled data. Then, the label-dependent representation results have poor feasibility to be transferred. Moreover, aggregating all information into the unified may introduce some noises in applications since it cannot distinguish the diverse characteristics of questions. In this paper, we aim to learn the disentangled representations of questions. We propose a novel unsupervised model, namely DisenQNet, to divide one question into two parts, i.e., a concept representation that captures its explicit concept meaning and an individual representation that preserves its personal characteristics. We achieve this goal via mutual information estimation by proposing three self-supervised estimators in a large unlabeled question corpus. Then, we propose another enhanced model, DisenQNet+, that transfers the representation knowledge from unlabeled questions to labeled questions in specific applications by maximizing the mutual information between both. Extensive experiments on real-world datasets demonstrate that DisenQNet can generate effective and meaningful disentangled representations for questions, and furthermore, DisenQNet+ can improve the performance of different applications.

引用

页码：696 / 704

页数：9

共 37 条

[1]

Anderson A, 2014, WWW'14: PROCEEDINGS OF THE 23RD INTERNATIONAL CONFERENCE ON WORLD WIDE WEB, P687

[2]

[Anonymous], 2017, P ANN REL MAINT S RA, DOI [DOI 10.1609/AAAI.V31I1.10866, DOI 10.1109/RAM.2017.7889722]

[3]

Arjovsky M, 2017, PR MACH LEARN RES, V70

[4]

Belghazi MI, 2018, PR MACH LEARN RES, V80

[5] Introducing a Framework to Assess Newly Created Questions with Natural Language Processing [J].

Benedetto, Luca ;

Cappelli, Andrea ;

Turrin, Roberto ;

Cremonesi, Paolo .

ARTIFICIAL INTELLIGENCE IN EDUCATION (AIED 2020), PT I, 2020, 12163 :43-54

[6]

Broer Markus., 2005, ETS RES REPORT SERIE, V1, pi

[7]

Choi Youngduck, 2020, ARXIV PREPRINT ARXIV

[8] Pre-Training With Whole Word Masking for Chinese BERT [J].

Cui, Yiming ;

Che, Wanxiang ;

Liu, Ting ;

Qin, Bing ;

Yang, Ziqing .

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 :3504-3514

[9]

Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171

[10]

Duan H., 2008, P ACL08 HLT, P156

← 1 2 3 4 →