Consumer Health Search on the Web: Study of Web Page Understandability and Its Integration in Ranking Algorithms

被引:16
作者
Palotti, Joao [1 ,2 ]
Zuccon, Guido [3 ]
Hanbury, Allan [2 ,4 ]
机构
[1] Qatar Comp Res Inst, Doha, Qatar
[2] Tech Univ Wien, Inst Informat Syst Engn, Favoritenstr 9-11-194 04, A-1040 Vienna, Austria
[3] Univ Queensland, Brisbane, Qld, Australia
[4] Complex Sci Hub Vienna, Vienna, Austria
基金
欧盟地平线“2020”; 澳大利亚研究理事会;
关键词
readability; literacy; comprehension; patients; machine learning; PATIENT EDUCATION MATERIALS; READABILITY ASSESSMENT; LITERACY; CYBERCHONDRIA; INFORMATION; DIFFICULTY; INTERNET; FORMULA;
D O I
10.2196/10986
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
摘要
Background: Understandability plays a key role in ensuring that people accessing health information are capable of gaining insights that can assist them with their health concerns and choices. The access to unclear or misleading information has been shown to negatively impact the health decisions of the general public. Objective: The aim of this study was to investigate methods to estimate the understandability of health Web pages and use these to improve the retrieval of information for people seeking health advice on the Web. Methods: Our investigation considered methods to automatically estimate the understandability of health information in Web pages, and it provided a thorough evaluation of these methods using human assessments as well as an analysis of preprocessing factors affecting understandability estimations and associated pitfalls. Furthermore, lessons learned for estimating Web page understandability were applied to the construction of retrieval methods, with specific attention to retrieving information understandable by the general public. Results: We found that machine learning techniques were more suitable to estimate health Web page understandability than traditional readability formulae, which are often used as guidelines and benchmark by health information providers on the Web (larger difference found for Pearson correlation of .602 using gradient boosting regressor compared with .438 using Simple Measure of Gobbledygook Index with the Conference and Labs of the Evaluation Forum eHealth 2015 collection). Conclusions: The findings reported in this paper are important for specialized search services tailored to support the general public in seeking health advice on the Web, as they document and empirically validate state-of-the-art techniques and settings for this domain application.
引用
收藏
页数:27
相关论文
共 78 条
[11]   Predicting reading difficulty with statistical language models [J].
Collins-Thompson, K ;
Callan, J .
JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2005, 56 (13) :1448-1462
[12]  
Collins-Thompson Kevyn, 2015, Recent Advances in Automatic Readability Assessment and Text Simplification, V165, P97, DOI [10.1075/itl.165.2.01col, DOI 10.1075/ITL.165.2.01COL]
[13]   Reciprocal Rank Fusion outperforms Condorcet and Individual Rank Learning Methods [J].
Cormack, Gordon V. ;
Clarke, Charles L. A. ;
Buettcher, Stefan .
PROCEEDINGS 32ND ANNUAL INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2009, :758-759
[14]  
Dale E, 1948, EDUC RES BULL, V27, P37
[15]  
Davis TC, 2004, FAM MED, V36, P595
[16]   Low Health Literacy and Evaluation of Online Health Information: A Systematic Review of the Literature [J].
Diviani, Nicola ;
van den Putte, Bas ;
Giani, Stefano ;
van Weert, Julia C. M. .
JOURNAL OF MEDICAL INTERNET RESEARCH, 2015, 17 (05)
[17]  
Doak C, 1995, TEACHING PATIENTS LO
[18]  
DuBay WH., 2004, PRINCIPLES READABILI
[19]  
Elhadad Noemie, 2006, AMIA Annu Symp Proc, P239
[20]   Readability of Websites Containing Information About Prostate Cancer Treatment Options [J].
Ellimoottil, Chandy ;
Polcari, Anthony ;
Kadlec, Adam ;
Gupta, Gopal .
JOURNAL OF UROLOGY, 2012, 188 (06) :2171-2175