Disease Inference from Health-Related Questions via Sparse Deep Learning

被引:126
作者
Nie, Liqiang [1 ]
Wang, Meng [3 ]
Zhang, Luming [1 ]
Yan, Shuicheng [2 ]
Zhang, Bo [4 ]
Chua, Tat-Seng [1 ]
机构
[1] Natl Univ Singapore, Sch Comp, Singapore 117548, Singapore
[2] Natl Univ Singapore, Dept Elect & Comp Engn, Singapore 117548, Singapore
[3] Hefei Univ Technol, Hefei, Peoples R China
[4] Tsinghua Univ, Dept Comp Sci & Technol, Beijing 100084, Peoples R China
关键词
Community-based health services; question answering; disease inference; deep learning;
D O I
10.1109/TKDE.2015.2399298
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Automatic disease inference is of importance to bridge the gap between what online health seekers with unusual symptoms need and what busy human doctors with biased expertise can offer. However, accurately and efficiently inferring diseases is non-trivial, especially for community-based health services due to the vocabulary gap, incomplete information, correlated medical concepts, and limited high quality training samples. In this paper, we first report a user study on the information needs of health seekers in terms of questions and then select those that ask for possible diseases of their manifested symptoms for further analytic. We next propose a novel deep learning scheme to infer the possible diseases given the questions of health seekers. The proposed scheme is comprised of two key components. The first globally mines the discriminant medical signatures from raw features. The second deems the raw features and their signatures as input nodes in one layer and hidden nodes in the subsequent layer, respectively. Meanwhile, it learns the inter-relations between these two layers via pre-training with pseudo-labeled data. Following that, the hidden nodes serve as raw features for the more abstract signature mining. With incremental and alternative repeating of these two components, our scheme builds a sparsely connected deep architecture with three hidden layers. Overall, it well fits specific tasks with fine-tuning. Extensive experiments on a real-world dataset labeled by online doctors show the significant performance gains of our scheme.
引用
收藏
页码:2107 / 2119
页数:13
相关论文
共 51 条
[1]  
Akgul C. B., 2009, P ACM INT C IM VID R, P34
[2]  
[Anonymous], 2012, P 20 ACM INT C MULTI, DOI DOI 10.1145/2393347.2393363
[3]  
[Anonymous], 1992, NIPS 91 P 4 INT C NE
[4]  
[Anonymous], 2014, P INT ACM SIGIR WORK
[5]  
[Anonymous], 2008, P 31 ANN INT ACM SIG
[6]  
[Anonymous], 2010, PROC 27 INT C INT C
[7]  
[Anonymous], 2013, ONL HLTH RES ECL PAT
[8]  
[Anonymous], 2012, P 18 ACM SIGKDD INT
[9]   An overview of MetaMap: historical perspective and recent advances [J].
Aronson, Alan R. ;
Lang, Francois-Michel .
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2010, 17 (03) :229-236
[10]  
Batal I., 2008, P AM MED INFORM ASS, P29