Inference during reading: multi-label classification for text with continuous semantic units

被引:0
作者
Xuetao Tian
Liping Jing
Fang Luo
Feng Liu
机构
[1] Beijing Jiaotong University,School of Computer and Information Technology
[2] Beijing Normal University,Collaborative Innovation Center of Assessment toward Basic Education Quality
来源
Applied Intelligence | 2022年 / 52卷
关键词
Multi-label text classification; Continuous semantic units; Cognitive text understanding; Text mining;
D O I
暂无
中图分类号
学科分类号
摘要
With the developing of electronic platform such as education, commerce and etc., text with continuous semantic units (CSU-text) emerges in large numbers. Each CSU-text is usually short but contains series of independent semantic units. Mining CSU-text is helpful to determine users’ preferences or intentions, and further improve service quality in real life. Even though there are lots of text mining techniques, they are hard to well handle CSU-text because they usually learn one representation for a whole document. In this case, the information hidden in various semantic units can not be sufficiently captured. Inspired by how a human being understands a text and acquires knowledge in cognitive science, in this paper, we treat multi-label classification for CSU-text as a sequence tagging task and propose a novel inference during reading (InfDR) model. The model is able to simultaneously partition continuous semantic units and map them to semantic labels. Extensive experiments are conducted on three real-world datasets, demonstrating that the proposed model is effective and significantly outperforms the existing baselines with one single text representation.
引用
收藏
页码:6292 / 6305
页数:13
相关论文
共 54 条
[1]  
Avery PG(1997)Scaffolding young learners’ reading of social studies texts Soc Stud Young Learn 9 10-14
[2]  
Graves MF(2004)Learning multi-label scene classification Pattern Recogn 37 1757-1771
[3]  
Boutella MR(2011)First among others? cohen’s d vs. alternative standardized mean group difference measures Pract Assess Res Eval 16 1-6
[4]  
Luob J(2021)Combining contextualized word representation and sub-document level analysis through bi-LSTM+CRF architecture for clinical de-identification Knowl Based Syst 213 649-225
[5]  
Shena X(2020)Crosslingual named entity recognition for clinical de-identification applied to a COVID-19 italian data set Appl Soft Comput 97 106,779-153
[6]  
Browna CM(2009)Combining instance-based learning and logistic regression for multilabel classification Mach Learn 76 211-1780
[7]  
Cahan S(2021)Attention-based biLSTM fused CNN with gating mechanism model for Chinese long text classification Comput Speech Lang 68 182-2627
[8]  
Catelli R(2008)Multilabel classification via calibrated label ranking Mach Learn 73 133-246
[9]  
Casola V(2015)A tutorial on multilabel learning ACM Comput Surv 47 52-359
[10]  
Pietro GD(1997)Long short-term memory Neural Comput 9 1735-9