Examining the Gateway Hypothesis and Mapping Substance UsePathways on Social Media:Machine Learning Approach

被引:0
作者
Yuan, Yunhao [1 ]
Kasson, Erin [2 ]
Taylor, Jordan [3 ]
Cavazos-Rehg, Patricia [2 ]
De Choudhury, Munmun [4 ]
Aledavood, Talayeh [1 ]
机构
[1] Aalto Univ, Dept Comp Sci, POB 11000,Otakaari 1B, FI-00076 Espoo, Finland
[2] Washington Univ, Sch Med, St Louis, MO USA
[3] Carnegie Mellon Univ, Pittsburgh, PA USA
[4] Georgia Inst Technol, Atlanta, GA USA
关键词
gateway hypothesis; substance use; social media; deep learning; natural language processing; DRUG-USE; PATTERNS; MARIJUANA; COMMUNITY; SUPPORT; ONLINE;
D O I
10.2196/54433
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
摘要
Background: Substance misuse presents significant global public health challenges. Understanding transitions between substancetypes and the timing of shifts to polysubstance use is vital to developing effective prevention and recovery strategies. The gatewayhypothesis suggests that high-risk substance use is preceded by lower-risk substance use. However, the source of this correlationis hotly contested. While some claim that low-risk substance use causes subsequent, riskier substance use, most people usinglow-risk substances also do not escalate to higher-risk substances. Social media data hold the potential to shed light on the factorscontributing to substance use transitions.Objective: By leveraging social media data, our study aimed to gain a better understanding of substance use pathways. Byidentifying and analyzing the transitions of individuals between different risk levels of substance use, our goal was to find specificlinguistic cues in individuals'social media posts that could indicate escalating or de-escalating patterns in substance use.Methods: We conducted a large-scale analysis using data from Reddit, collected between 2015 and 2019, consisting of over2.29 million posts and approximately 29.37 million comments by around 1.4 million users from subreddits. These data, derivedfrom substance use subreddits, facilitated the creation of a risk transition data set reflecting the substance use behaviors of over1.4 million users. We deployed deep learning and machine learning techniques to predict the escalation or de-escalation transitionsin risk levels, based on initial transition phases documented in posts and comments. We conducted a linguistic analysis to analyzethe language patterns associated with transitions in substance use, emphasizing the role of n-gram features in predicting futurerisk trajectories.Results: Our results showed promise in predicting the escalation or de-escalation transition in risk levels, based on the historicaldata of Reddit users created on initial transition phases among drug-related subreddits, with an accuracy of 78.48% and an F1-scoreof 79.20%. We highlighted the vital predictive features, such as specific substance names and tools indicative of future riskescalations. Our linguistic analysis showed that terms linked with harm reduction strategies were instrumental in signalingde-escalation, whereas descriptors of frequent substance use were characteristic of escalating transitions.Conclusions: This study sheds light on the complexities surrounding the gateway hypothesis of substance use through anexamination of web-based behavior on Reddit. While certain findings validate the hypothesis, indicating a progression fromlower-risk substances such as marijuana to higher-risk ones, a significant number of individuals did not show this transition. Theresearch underscores the potential of using machine learning with social media analysis to predict substance use transitions. Ourresults point toward future directions for leveraging social media data in substance use research, underlining the importance ofcontinued exploration before suggesting direct implications for interventions
引用
收藏
页数:19
相关论文
共 63 条
  • [1] Long-term effects of psychedelic drugs: A systematic review
    Aday, Jacob S.
    Mitzkovitz, Cayla M.
    Bloesch, Emily K.
    Davoli, Christopher C.
    Davis, Alan K.
    [J]. NEUROSCIENCE AND BIOBEHAVIORAL REVIEWS, 2020, 113 : 179 - 189
  • [2] Announcing Pregnancy Loss on Facebook: A Decision-Making Framework for Stigmatized Disclosures on Identified Social Network Sites
    Andalibi, Nazanin
    Forte, Andrea
    [J]. PROCEEDINGS OF THE 2018 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS (CHI 2018), 2018,
  • [3] [Anonymous], Is marijuana a gateway drug?
  • [4] [Anonymous], High-Risk Substance Use Among Youth
  • [5] [Anonymous], 2023, Drug Overdose Death Rates
  • [6] Routes of Administration of Cannabis Used for Nonmedical Purposes and Associations With Patterns of Drug Use
    Baggio, Stephanie
    Deline, Stephane
    Studer, Joseph
    Mohler-Kuo, Meichun
    Daeppen, Jean-Bernard
    Gmel, Gerhard
    [J]. JOURNAL OF ADOLESCENT HEALTH, 2014, 54 (02) : 235 - 240
  • [7] Balsamo D., 2023, P INT AAAI C WEB SOC, V17, P12
  • [8] Patterns of Routes of Administration and Drug Tampering for Nonmedical Opioid Consumption: Data Mining and Content Analysis of Reddit Discussions
    Balsamo, Duilio
    Bajardi, Paolo
    Salomone, Alberto
    Schifanella, Rossano
    [J]. JOURNAL OF MEDICAL INTERNET RESEARCH, 2021, 23 (01)
  • [9] Firsthand Opiates Abuse on Social Media: Monitoring Geospatial Patterns of Interest Through a Digital Cohort
    Balsamo, Duilio
    Bajardi, Paolo
    Panisson, Andre
    [J]. WEB CONFERENCE 2019: PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE (WWW 2019), 2019, : 2572 - 2579
  • [10] Bisong Ekaba, 2019, Logistic Regression BT: Building Machine Learning and Deep Learning Models on Google Cloud Platform: A Comprehensive Guide for Beginners, P243, DOI [DOI 10.1007/978-1-4842-4470-8, DOI 10.1007/978-1-4842-4470-820]