A Semi-Supervised Learning Approach to Enhance Health Care Community-Based Question Answering: A Case Study in Alcoholism

被引:8
|
作者
Wongchaisuwat, Papis [1 ]
Klabjan, Diego [1 ]
Jonnalagadda, Siddhartha Reddy [2 ]
机构
[1] Northwestern Univ, Dept Ind Engn & Management Sci, 2145 Sheridan Rd, Evanston, IL 60208 USA
[2] Northwestern Univ, Feinberg Sch Med, Div Hlth & Biomed Informat, Chicago, IL 60611 USA
来源
JMIR MEDICAL INFORMATICS | 2016年 / 4卷 / 03期
关键词
machine learning; natural language processing; question answering; Web-based health communities; consumer health informatics; TEXT;
D O I
10.2196/medinform.5490
中图分类号
R-058 [];
学科分类号
摘要
Background: Community-based question answering (CQA) sites play an important role in addressing health information needs. However, a significant number of posted questions remain unanswered. Automatically answering the posted questions can provide a useful source of information for Web-based health communities. Objective: In this study, we developed an algorithm to automatically answer health-related questions based on past questions and answers (QA). We also aimed to understand information embedded within Web-based health content that are good features in identifying valid answers. Methods: Our proposed algorithm uses information retrieval techniques to identify candidate answers from resolved QA. To rank these candidates, we implemented a semi-supervised leaning algorithm that extracts the best answer to a question. We assessed this approach on a curated corpus from Yahoo! Answers and compared against a rule-based string similarity baseline. Results: On our dataset, the semi-supervised learning algorithm has an accuracy of 86.2%. Unified medical language system-based (health related) features used in the model enhance the algorithm's performance by proximately 8%. A reasonably high rate of accuracy is obtained given that the data are considerably noisy. Important features distinguishing a valid answer from an invalid answer include text length, number of stop words contained in a test question, a distance between the test question and other questions in the corpus, and a number of overlapping health-related terms between questions. Conclusions: Overall, our automated QA system based on historical QA pairs is shown to be effective according to the dataset in this case study. It is developed for general use in the health care domain, which can also be applied to other CQA sites.
引用
收藏
页码:18 / 30
页数:13
相关论文
共 14 条
  • [1] Case-Base Maintenance: An Approach Based on Active Semi-Supervised Learning
    Chebli, Asma
    Djebbar, Akila
    Merouani, Hayet Farida
    Lounis, Hakim
    INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2021, 35 (11)
  • [2] Leveraging linguistic traits and semi-supervised learning to single out informational content across how-to community question-answering archives
    Palomera, Daniel
    Figueroa, Alejandro
    INFORMATION SCIENCES, 2017, 381 : 20 - 32
  • [3] A Semi-Supervised Learning Approach to Quality-Based Web Service Classification
    Bonab, Mehdi Nozad
    Tanha, Jafar
    Masdari, Mohammad
    IEEE ACCESS, 2024, 12 : 50489 - 50503
  • [4] A Novel Semi-Supervised Learning Model for Smartphone-Based Health Telemonitoring
    Gaw, Nathan
    Li, Jing
    Yoon, Hyunsoo
    IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, 2024, 21 (01) : 428 - 441
  • [5] Semi-supervised clustering-based method for fault diagnosis and prognosis: A case study
    Azar, Kamyar
    Hajiakhondi-Meybodi, Zohreh
    Naderkhani, Farnoosh
    RELIABILITY ENGINEERING & SYSTEM SAFETY, 2022, 222
  • [6] Comprehensive study of semi-supervised learning for DNA methylation-based supervised classification of central nervous system tumors
    Tran, Quynh T.
    Alom, Md Zahangir
    Orr, Brent A.
    BMC BIOINFORMATICS, 2022, 23 (01)
  • [7] Comprehensive study of semi-supervised learning for DNA methylation-based supervised classification of central nervous system tumors
    Quynh T. Tran
    Md Zahangir Alom
    Brent A. Orr
    BMC Bioinformatics, 23
  • [8] T5-Based Model for Abstractive Summarization: A Semi-Supervised Learning Approach with Consistency Loss Functions
    Wang, Mingye
    Xie, Pan
    Du, Yao
    Hu, Xiaohui
    APPLIED SCIENCES-BASEL, 2023, 13 (12):
  • [9] Multiscale and Adversarial Learning-Based Semi-Supervised Semantic Segmentation Approach for Crack Detection in Concrete Structures
    Shim, Seungbo
    Kim, Jin
    Cho, Gye-Chun
    Lee, Seong-Won
    IEEE ACCESS, 2020, 8 : 170939 - 170950
  • [10] An Accurate Recognition Method for Landslides Based on a Semi-Supervised Generative Adversarial Network: A Case Study in Lanzhou City
    Lu, Wenjuan
    Zhao, Zhan'ao
    Mao, Xi
    Cheng, Yao
    APPLIED SCIENCES-BASEL, 2024, 14 (12):