A text classification framework for simple and effective early depression detection over social media streams

被引:100
作者
Burdisso, Sergio G. [1 ,2 ]
Errecalde, Marcelo [1 ]
Montes-y-Gomez, Manuel [3 ]
机构
[1] UNSL, Ejercito Andes 950, RA-5700 San Lius, Argentina
[2] Consejo Nacl Invest Cientif & Tecn CONICET, Buenos Aires, DF, Argentina
[3] INAOE, Luis Enrique Erro 1, Puebla 72840, Mexico
关键词
Early text classification; Early depression detection; Incremental classification; SS3; Interpretability; Explainability;
D O I
10.1016/j.eswa.2019.05.023
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With the rise of the Internet, there is a growing need to build intelligent systems that are capable of efficiently dealing with early risk detection (ERD) problems on social media, such as early depression detection, early rumor detection or identification of sexual predators. These systems, nowadays mostly based on machine learning techniques, must be able to deal with data streams since users provide their data over time. In addition, these systems must be able to decide when the processed data is sufficient to actually classify users. Moreover, since ERD tasks involve risky decisions by which people's lives could be affected, such systems must also be able to justify their decisions. However, most standard and state-of-the-art supervised machine learning models (such as SVM, MNB, Neural Networks, etc.) are not well suited to deal with this scenario. This is due to the fact that they either act as black boxes or do not support incremental classification/learning. In this paper we introduce SS3, a novel supervised learning model for text classification that naturally supports these aspects. SS3 was designed to be used as a general framework to deal with ERD problems. We evaluated our model on the CLEF's eRisk2017 pilot task on early depression detection. Most of the 30 contributions submitted to this competition used state-of-the-art methods. Experimental results show that our classifier was able to outperform these models and standard classifiers, despite being less computationally expensive and having the ability to explain its rationale. (C) 2019 Elsevier Ltd. All rights reserved.
引用
收藏
页码:182 / 197
页数:16
相关论文
共 38 条
[1]  
Almeida H, 2017, P C LAB EV FOR CLEF
[2]  
[Anonymous], 2017, DEPR OTH COMM MENT D
[3]  
[Anonymous], 2013, DIAGNOSTIC STAT MANU, VFifth, P1000, DOI [10.1176/appi.books.9780890425596, DOI 10.1176/APPI.BOOKS.9780890425596]
[4]  
[Anonymous], 1949, Human behaviour and the principle of least-effort
[5]  
De Choudhury M., 2013, ICWSM, P1
[6]  
De Choudhury M., 2013, P 5 ANN ACM WEB SCI, P47
[7]  
Dean J, 2004, USENIX ASSOCIATION PROCEEDINGS OF THE SIXTH SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION (OSDE '04), P137
[8]  
Dulac-Arnold G, 2011, LECT NOTES COMPUT SC, V6611, P411, DOI 10.1007/978-3-642-20161-5_41
[9]  
Escalante Hugo Jair, 2016, P 7 WORKSH COMP APPR, P91, DOI DOI 10.18653/V1/W16-0416
[10]  
Farias-Anzaldua A. A, 2017, P C LAB EV FOR CLEF