Evaluating the relevance of health-related topics using three similarity measures

被引:0
作者
Zhu, Yifan [1 ]
Zhang, Jin [2 ]
机构
[1] Hangzhou Normal Univ, Sch Publ Hlth, 2318 Yuhangtang Rd, Hangzhou 311121, Zhejiang, Peoples R China
[2] Univ Wisconsin Milwaukee, Sch Informat Studies, Milwaukee, WI USA
关键词
Similarity measures; health topics analysis; medical corpus; semantic linkages; MedlinePlus; MENTAL-HEALTH; INFORMATION-RETRIEVAL; SEMANTIC SIMILARITY; OLDER-ADULTS; MODEL; ENVIRONMENT; NAVIGATION; ACCURACY; INTERNET; CHILDREN;
D O I
10.1177/02666669251316264
中图分类号
G25 [图书馆学、图书馆事业]; G35 [情报学、情报工作];
学科分类号
1205 ; 120501 ;
摘要
This study evaluated the effectiveness of three similarity measures-Cosine similarity, Pearson correlation, and Euclidean distance-in assessing health-related topics on MedlinePlus. The focus was on four health topic subcategories: mental health, children, teenagers, and older adults. Using adjacency matrices of graph theory and the three similarity measures, the study found that both Cosine and Pearson correlation measures were more empirically robust than the Euclidean distance measure. Notably, the alignment in findings from Cosine and Pearson correlation suggests their potential combined use in future research as complementary strategies. To validate the findings, hypothesis testing showed that Cosine and Pearson correlation were significantly effective in identifying similar health topics and distinguishing between different semantic subgroups, whereas Euclidean distance showed limitations. These insights guide the application of adjacency matrices and the selection of suitable similarity measures to evaluate semantic linkages in health topics, enhancing relevance recognition and supporting classification in medical domains.
引用
收藏
页数:20
相关论文
共 82 条
  • [21] Goswami M., 2018, Applied Science Management, V8, P786
  • [22] Modeling a medical environment: an ontology for integrated medical informatics design
    Hajdukiewicz, JR
    Vicente, KJ
    Doyle, DJ
    Milgram, P
    Burns, CM
    [J]. INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS, 2001, 62 (01) : 79 - 99
  • [23] Han J., 2006, Data mining: Concepts and techniques, DOI [10.1016/B978-012088469-8/50002-2, DOI 10.1016/B978-012088469-8/50002-2]
  • [24] Information retrieval by semantic similarity
    Hliaoutakis, Angelos
    Varelas, Giannis
    Voutsakis, Epimenidis
    Petrakis, Euripides G. M.
    Milios, Evangelos
    [J]. INTERNATIONAL JOURNAL ON SEMANTIC WEB AND INFORMATION SYSTEMS, 2006, 2 (03) : 55 - 73
  • [25] Lifetime prevalence and age-of-onset distributions' of DSM-IV disorders in the national comorbidity survey replication
    Kessler, RC
    Berglund, P
    Demler, O
    Jin, R
    Walters, EE
    [J]. ARCHIVES OF GENERAL PSYCHIATRY, 2005, 62 (06) : 593 - 602
  • [26] Khatter Harsh, 2021, 2021 Third International Conference on Inventive Research in Computing Applications (ICIRCA), P597, DOI 10.1109/ICIRCA51532.2021.9544794
  • [27] Impact of similarity metrics on single-cell RNA-seq data clustering
    Kim, Taiyun
    Chen, Irene Rui
    Lin, Yingxin
    Wang, Andy Yi-Yang
    Yang, Jean Yee Hwa
    Yang, Pengyi
    [J]. BRIEFINGS IN BIOINFORMATICS, 2019, 20 (06) : 2316 - 2326
  • [28] Kogan S, 2001, J AM MED INFORM ASSN, P329
  • [29] Koopman B., 2012, Proceedings of the 21st ACM international conference on Information and knowledge management, P2439, DOI DOI 10.1145/2396761.2398661
  • [30] Korfhage R.R., 1997, INFORM STORAGE RETRI