Self-taught learning via exponential family sparse coding for cost-effective patient thought record categorization

被引：0

作者：

Hua Wang

Heng Huang

Monica Basco

Molly Lopez

Fillia Makedon

机构：

[1] Colorado School of Mines,Department of Electrical Engineering and Computer Science

[2] University of Texas at Arlington,Department of Computer Science and Engineering

[3] University of Texas at Arlington,Department of Psychology

[4] University of Texas,School of Social Work

来源：

Personal and Ubiquitous Computing | 2014年 / 18卷

关键词：

Major depressive disorder; Cognitive behavior therapy; Thought record; Self-taught learning; Exponential family; Cost-effective classification;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Automatic patient thought record categorization (TR) is important in cognitive behavior therapy, which is an useful augmentation of standard clinic treatment for major depressive disorder. Because both collecting and labeling TR data are expensive, it is usually cost prohibitive to require a large amount of TR data, as well as their corresponding category labels, to train a classification model with high classification accuracy. Because in practice we only have very limited amount of labeled and unlabeled training TR data, traditional semi-supervised learning methods and transfer learning methods, which are the most commonly used strategies to deal with the lack of training data in statistical learning, cannot work well in the task of automatic TR categorization. To address this challenge, we propose to tackle the TR categorization problem from a new perspective via self-taught learning, an emerging technique in machine learning. Self-taught learning is a special type of transfer learning. Instead of requiring labeled data from an auxiliary domain that are relevant to the classification task of interest as in traditional transfer learning methods, it learns the inherent structures of the auxiliary data and does not require their labels. As a result, a classifier achieves decent classification accuracy using the limited amount of labeled TR texts, with the assistance from the large amount of text data obtained from some inexpensive, or even no-cost, resources. That is, a cost-effective TR categorization system can be built that may be particularly useful for diagnosis of patients and training of new therapists. By further taking into account the discrete nature input text data, instead of using the traditional Gaussian sparse coding in self-taught learning, we use exponential family sparse coding to better simulate the distribution of the input data. We apply the proposed method to the task of classifying patient homework texts. Experimental results show the effectiveness of the proposed automatic TR classification framework.

引用

页码：27 / 35

页数：8

共 8 条

[1] Duan K(2005)Which is the best multiclass SVM method? An empirical study Multiple Classifier Systems 3541 278-285
[2] Keerthi S(2001)Idiot’s BayesNot So Stupid After All International Statistical Review 69 385-398
[3] Hand D(2002)A comparison of methods for multiclass support vector machines IEEE Trans Neural Networks 13 415-425
[4] Yu K(2008)Bayesian inference and optimal design for the sparse linear model The Journal of Machine Learning Research 9 759-813
[5] Hsu C(1996)Regression shrinkage and selection via the lasso J R Stat Soc B 58 267-288
[6] Lin C(undefined)undefined undefined undefined undefined-undefined
[7] Seeger M(undefined)undefined undefined undefined undefined-undefined
[8] Tibshirani R(undefined)undefined undefined undefined undefined-undefined

← 1 →