Automated Credibility Assessment of Web-Based Health Information Considering Health on the Net Foundation Code of Conduct (HONcode): Model Development and Validation Study

被引:0
|
作者
Bayani, Azadeh [1 ,2 ,3 ]
Ayotte, Alexandre [1 ,2 ,3 ]
Nikiema, Jean Noel [1 ,2 ,3 ,4 ]
机构
[1] Univ Montreal, Ctr Rech Sante Publ, Montreal, PQ H3C 3J7, Canada
[2] Ctr Integre Univ Sante & Serv Sociaux Ctr Sud Ile, Montreal, PQ H3C 3J7, Canada
[3] Lab Transformat Numer Sante, Montreal, PQ, Canada
[4] Univ Montreal, Sch Publ Hlth, Dept Management Evaluat & Hlth Policy, Montreal, PQ, Canada
关键词
HONcode; infodemic; natural language processing; web-based health information; machine learning; CONFORMITY; INTERNET;
D O I
10.2196/52995
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
摘要
Background: An increasing number of users are turning to web-based sources as an important source of health care guidance information. Thus, trustworthy sources of information should be automatically identifiable using objective criteria.Objective: The purpose of this study was to automate the assessment of the Health on the Net Foundation Code of Conduct (HONcode) criteria, enhancing our ability to pinpoint trustworthy health information sources.Methods: A data set of 538 web pages displaying health content was collected from 43 health-related websites. HONcode criteria have been considered as web page and website levels. For the website-level criteria (confidentiality, transparency, financial disclosure, and advertising policy), a bag of keywords has been identified to assess the criteria using a rule-based model. For the web page-level criteria (authority, complementarity, justifiability, and attribution) several machine learning (ML) approaches were used. In total, 200 web pages were manually annotated until achieving a balanced representation in terms of frequency. In total, 3 ML models-random forest, support vector machines (SVM), and Bidirectional Encoder Representations from Transformers (BERT)-were trained on the initial annotated data. A second step of training was implemented for the complementarity criterion using the BERT model for multiclass classification of the complementarity sentences obtained by annotation and data augmentation (positive, negative, and noncommittal sentences). Finally, the remaining web pages were classified using the selected model and 100 sentences were randomly selected for manual review.Results: For web page-level criteria, the random forest model showed a good performance for the attribution criterion while displaying subpar performance in the others. BERT and SVM had a stable performance across all the criteria. BERT had a better area under the curve (AUC) of 0.96, 0.98, and 1.00 for neutral sentences, justifiability, and attribution, respectively. SVM had the overall better performance for the classification of complementarity with the AUC equal to 0.98. Finally, SVM and BERT had an equal AUC of 0.98 for the authority criterion. For the website level criteria, the rule-based model was able to retrieve web pages with an accuracy of 0.97 for confidentiality, 0.82 for transparency, and 0.51 for both financial disclosure and advertising policy. The final evaluation of the sentences determined 0.88 of precision and the agreement level of reviewers was computed at 0.82.Conclusions: Our results showed the potential power of automating the HONcode criteria assessment using ML approaches. This approach could be used with different types of pretrained models to accelerate the text annotation, and classification and to improve the performance in low-resource cases. Further work needs to be conducted to determine how to assign different weights to the criteria, as well as to identify additional characteristics that should be considered for consolidating these criteria into a comprehensive reliability score.(JMIR Form Res 2023;7:e52995) doi: 10.2196/52995
引用
收藏
页数:14
相关论文
共 29 条
  • [21] Community Engagement to Optimize the Use of Web-Based and Wearable Technology in a Cardiovascular Health and Needs Assessment Study: A Mixed Methods Approach
    Yingling, Leah R.
    Brooks, Alyssa T.
    Wallen, Gwenyth R.
    Peters-Lawrence, Marlene
    McClurkin, Michael
    Cooper-McCann, Rebecca
    Wiley, Kenneth L., Jr.
    Mitchell, Valerie
    Saygbe, Johnetta N.
    Johnson, Twanda D.
    Curry, Kendrick E.
    Johnson, Allan A.
    Graham, Avis P.
    Graham, Lennox A.
    Powell-Wiley, Tiffany M.
    JMIR MHEALTH AND UHEALTH, 2016, 4 (02): : 38 - 55
  • [22] eHealth Literacy and Web-Based Health Information-Seeking Behaviors on COVID-19 in Japan: Internet-Based Mixed Methods Study
    Mitsutake, Seigo
    Oka, Koichiro
    Okan, Orkan
    Dadaczynski, Kevin
    Ishizaki, Tatsuro
    Nakayama, Takeo
    Takahashi, Yoshimitsu
    JOURNAL OF MEDICAL INTERNET RESEARCH, 2024, 26
  • [23] Digital Health Literacy and Web-Based Information-Seeking Behaviors of University Students in Germany During the COVID-19 Pandemic: Cross-sectional Survey Study
    Dadaczynski, Kevin
    Okan, Orkan
    Messer, Melanie
    Leung, Angela Y. M.
    Rosario, Rafaela
    Darlington, Emily
    Rathmann, Katharina
    JOURNAL OF MEDICAL INTERNET RESEARCH, 2021, 23 (01)
  • [24] Development and Temporal Validation of an Electronic Medical Record-Based Insomnia Prediction Model Using Data from a Statewide Health Information Exchange
    Holler, Emma
    Chekani, Farid
    Ai, Jizhou
    Meng, Weilin
    Khandker, Rezaul Karim
    Ben Miled, Zina
    Owora, Arthur
    Dexter, Paul
    Campbell, Noll
    Solid, Craig
    Boustani, Malaz
    JOURNAL OF CLINICAL MEDICINE, 2023, 12 (09)
  • [25] Optimization of a Web-Based Self-Assessment Tool for Preconception Health in People of Reproductive Age in Australia: User Feedback and User-Experience Testing Study
    Dorney, Edwina
    Hammarberg, Karin
    Rodgers, Raymond
    Black, Kirsten, I
    JMIR HUMAN FACTORS, 2024, 11
  • [26] Disease Concept-Embedding Based on the Self-Supervised Method for Medical Information Extraction from Electronic Health Records and Disease Retrieval: Algorithm Development and Validation Study
    Chen, Yen-Pin
    Lo, Yuan-Hsun
    Lai, Feipei
    Huang, Chien-Hua
    JOURNAL OF MEDICAL INTERNET RESEARCH, 2021, 23 (01)
  • [27] Assessment of the Effectiveness of Identity-Based Public Health Announcements in Increasing the Likelihood of Complying With COVID-19 Guidelines: Randomized Controlled Cross-sectional Web-Based Study
    Dennis, Alexander S.
    Moravec, Patricia L.
    Kim, Antino
    Dennis, Alan R.
    JMIR PUBLIC HEALTH AND SURVEILLANCE, 2021, 7 (04):
  • [28] Development and validation of a machine learning-based diagnostic model for Parkinson's disease in community-dwelling populations: Evidence from the China health and retirement longitudinal study (CHARLS)
    Fan, Hongyang
    Li, Sai
    Guo, Xin
    Chen, Min
    Zhang, Honggao
    Chen, Yingzhu
    PARKINSONISM & RELATED DISORDERS, 2025, 130
  • [29] A Natural Language Processing Model for COVID-19 Detection Based on Dutch General Practice Electronic Health Records by Using Bidirectional Encoder Representations From Transformers: Development and Validation Study
    Homburg, Maarten
    Meijer, Eline
    Berends, Matthijs
    Kupers, Thijmen
    Hartman, Tim Olde
    Muris, Jean
    de Schepper, Evelien
    Velek, Premysl
    Kuiper, Jeroen
    Berger, Marjolein
    Peters, Lilian
    JOURNAL OF MEDICAL INTERNET RESEARCH, 2023, 25