Automated Detection of Health Websites' HONcode Conformity: Can N-gram Tokenization Replace Stemming?

被引:5
|
作者
Boyer, Celia [1 ]
Dolamic, Ljiljana [1 ]
Grabar, Natalia [2 ]
机构
[1] Hlth Net Fdn, Geneva, Switzerland
[2] Univ Lille 3, Villeneuve Dascq, France
来源
MEDINFO 2015: EHEALTH-ENABLED HEALTH | 2015年 / 216卷
关键词
Machine learning; N-gram; HONcode;
D O I
10.3233/978-1-61499-564-7-1064
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Authors evaluated supervised automatic classcation algorithms for determination of health related web page compliance with individual HONcode criteria of conduct using varying length character n-gram vectors to represent healthcare web page documents. The training/testing collection comprised web page fragments extracted by HONcode experts during the manual certification process. The authors compared automated classification performance of n-gram tokenization to the automated classcation performance of document words and Porter-stemmed document words using a Naive Bayes classifier and DF (document frequency) dimensionality reduction metrics. The study attempted to determine whether the automated, language-independent approach might safely replace word-based classification. Using 5 grams as document features, authors also compared the baseline DF reduction function to Chi-square and Z-score dimensionality reductions. Overall study results indicate that n gram tokenization provided a potentially viable alternative to document word stemming.
引用
收藏
页码:1064 / 1064
页数:1
相关论文
共 3 条
  • [1] Language independent tokenization vs. stemming in automated detection of health websites' HONcode conformity: An Evaluation
    Boyer, Celia
    Dolamic, Ljiljana
    Falquet, Gilles
    CONFERENCE ON ENTERPRISE INFORMATION SYSTEMS/INTERNATIONAL CONFERENCE ON PROJECT MANAGEMENT/CONFERENCE ON HEALTH AND SOCIAL CARE INFORMATION SYSTEMS AND TECHNOLOGIES, CENTERIS/PROJMAN / HCIST 2015, 2015, 64 : 224 - 231
  • [2] Feasibility of automated detection of HONcode conformity for health-related websites
    Boyer, Celia
    Dolamic, Ljiljana
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2014, 5 (03) : 69 - 74
  • [3] SIDiLDNG: A similarity-based intrusion detection system using improved Levenshtein Distance and N-gram for CAN
    Song, Jiaru
    Qin, Guihe
    Liang, Yanhua
    Yan, Jie
    Sun, Minghui
    COMPUTERS & SECURITY, 2024, 142