Examining the Gateway Hypothesis and Mapping Substance UsePathways on Social Media:Machine Learning Approach

被引:0
作者
Yuan, Yunhao [1 ]
Kasson, Erin [2 ]
Taylor, Jordan [3 ]
Cavazos-Rehg, Patricia [2 ]
De Choudhury, Munmun [4 ]
Aledavood, Talayeh [1 ]
机构
[1] Aalto Univ, Dept Comp Sci, POB 11000,Otakaari 1B, FI-00076 Espoo, Finland
[2] Washington Univ, Sch Med, St Louis, MO USA
[3] Carnegie Mellon Univ, Pittsburgh, PA USA
[4] Georgia Inst Technol, Atlanta, GA USA
关键词
gateway hypothesis; substance use; social media; deep learning; natural language processing; DRUG-USE; PATTERNS; MARIJUANA; COMMUNITY; SUPPORT; ONLINE;
D O I
10.2196/54433
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
摘要
Background: Substance misuse presents significant global public health challenges. Understanding transitions between substancetypes and the timing of shifts to polysubstance use is vital to developing effective prevention and recovery strategies. The gatewayhypothesis suggests that high-risk substance use is preceded by lower-risk substance use. However, the source of this correlationis hotly contested. While some claim that low-risk substance use causes subsequent, riskier substance use, most people usinglow-risk substances also do not escalate to higher-risk substances. Social media data hold the potential to shed light on the factorscontributing to substance use transitions.Objective: By leveraging social media data, our study aimed to gain a better understanding of substance use pathways. Byidentifying and analyzing the transitions of individuals between different risk levels of substance use, our goal was to find specificlinguistic cues in individuals'social media posts that could indicate escalating or de-escalating patterns in substance use.Methods: We conducted a large-scale analysis using data from Reddit, collected between 2015 and 2019, consisting of over2.29 million posts and approximately 29.37 million comments by around 1.4 million users from subreddits. These data, derivedfrom substance use subreddits, facilitated the creation of a risk transition data set reflecting the substance use behaviors of over1.4 million users. We deployed deep learning and machine learning techniques to predict the escalation or de-escalation transitionsin risk levels, based on initial transition phases documented in posts and comments. We conducted a linguistic analysis to analyzethe language patterns associated with transitions in substance use, emphasizing the role of n-gram features in predicting futurerisk trajectories.Results: Our results showed promise in predicting the escalation or de-escalation transition in risk levels, based on the historicaldata of Reddit users created on initial transition phases among drug-related subreddits, with an accuracy of 78.48% and an F1-scoreof 79.20%. We highlighted the vital predictive features, such as specific substance names and tools indicative of future riskescalations. Our linguistic analysis showed that terms linked with harm reduction strategies were instrumental in signalingde-escalation, whereas descriptors of frequent substance use were characteristic of escalating transitions.Conclusions: This study sheds light on the complexities surrounding the gateway hypothesis of substance use through anexamination of web-based behavior on Reddit. While certain findings validate the hypothesis, indicating a progression fromlower-risk substances such as marijuana to higher-risk ones, a significant number of individuals did not show this transition. Theresearch underscores the potential of using machine learning with social media analysis to predict substance use transitions. Ourresults point toward future directions for leveraging social media data in substance use research, underlining the importance ofcontinued exploration before suggesting direct implications for interventions
引用
收藏
页数:19
相关论文
共 63 条
[41]   Explainable AI: A Review of Machine Learning Interpretability Methods [J].
Linardatos, Pantelis ;
Papastefanopoulos, Vasilis ;
Kotsiantis, Sotiris .
ENTROPY, 2021, 23 (01) :1-45
[42]   Detecting and Measuring Depression on Social Media Using a Machine Learning Approach: Systematic Review [J].
Liu, Danxia ;
Feng, Xing Lin ;
Ahmed, Farooq ;
Shahid, Muhammad ;
Guo, Jing .
JMIR MENTAL HEALTH, 2022, 9 (03)
[43]  
Lu J, 2019, KDD'19: PROCEEDINGS OF THE 25TH ACM SIGKDD INTERNATIONAL CONFERENCCE ON KNOWLEDGE DISCOVERY AND DATA MINING, P2367, DOI 10.1145/3292500.3330737
[44]   Risk Factors for Drug Overdose in Young People: A Systematic Review of the Literature [J].
Lyons, Rachael M. ;
Yule, Amy M. ;
Schiff, Davida ;
Bagley, Sarah M. ;
Wilens, Timothy E. .
JOURNAL OF CHILD AND ADOLESCENT PSYCHOPHARMACOLOGY, 2019, 29 (07) :487-497
[45]  
McCarthy Justin., 2016, One in Eight U.S. Adults Say They Smoke Marijuana
[46]   Social Media Text Mining Framework for Drug Abuse: Development and Validation Study With an Opioid Crisis Case Analysis [J].
Nasralah, Tareq ;
El-Gayar, Omar ;
Wang, Yong .
JOURNAL OF MEDICAL INTERNET RESEARCH, 2020, 22 (08)
[47]  
Nkansah-Amankra Stephen, 2016, Prev Med Rep, V4, P134, DOI 10.1016/j.pmedr.2016.05.003
[48]   Can social psychological delinquency theory explain the link between marijuana and other illicit drug use? Al ongitudinal analysis of the gateway hypothesis [J].
Rebellon, Cesar J. ;
Van Gundy, Karen .
JOURNAL OF DRUG ISSUES, 2006, 36 (03) :515-539
[49]   What Life Events are Disclosed on Social Media, How, When, and By Whom? [J].
Saha, Koustuv ;
Seybolt, Jordyn ;
Mattingly, Stephen M. ;
Aledavood, Talayeh ;
Konjeti, Chaitanya ;
Martinez, Gonzalo J. ;
Grover, Ted ;
Mark, Gloria ;
De Choudhury, Munmun .
CHI '21: PROCEEDINGS OF THE 2021 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS, 2021,
[50]   Understanding Side Effects of Antidepressants: Large-scale Longitudinal Study on Social Media Data [J].
Saha, Koustuv ;
Torous, John ;
Kiciman, Emre ;
De Choudhury, Munmun .
JMIR MENTAL HEALTH, 2021, 8 (03)