A Part-Of-Speech term weighting scheme for biomedical information retrieval

被引:22
|
作者
Wang, Yanshan [1 ]
Wu, Stephen [2 ]
Li, Dingcheng [1 ]
Mehrabi, Saeed [1 ]
Liu, Hongfang [1 ]
机构
[1] Mayo Clin, Dept Hlth Sci Res, Rochester, MN 55905 USA
[2] Oregon Hlth & Sci Univ, Dept Med Informat & Clin Epidemiol, Portland, OR 97201 USA
基金
美国国家卫生研究院;
关键词
Biomedical information retrieval; Natural language processing; Part-Of-Speech; Bag-of-word; Markov random field; RECORDS; MODELS;
D O I
10.1016/j.jbi.2016.08.026
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
In the era of digitalization, information retrieval (IR), which retrieves and ranks documents from large collections according to users' search queries, has been popularly applied in the biomedical domain. Building patient cohorts using electronic health records (EHRs) and searching literature for topics of interest are some IR use cases. Meanwhile, natural language processing (NLP), such as tokenization or Part-Of-Speech (POS) tagging, has been developed for processing clinical documents or biomedical literature. We hypothesize that NLP can be incorporated into IR to strengthen the conventional IR models. In this study, we propose two NLP-empowered IR models, POS-BoW and POS-MRF, which incorporate automatic POS-based term weighting schemes into bag-of-word (BoW) and Markov Random Field (MRF) IR models, respectively. In the proposed models, the POS-based term weights are iteratively calculated by utilizing a cyclic coordinate method where golden section line search algorithm is applied along each coordinate to optimize the objective function defined by mean average precision (MAP). In the empirical experiments, we used the data sets from the Medical Records track in Text REtrieval Conference (TREC) 2011 and 2012 and the Genomics track in TREC 2004. The evaluation on TREC 2011 and 2012 Medical Records tracks shows that, for the POS-BoW models, the mean improvement rates for IR evaluation metrics, MAP, bpref, and P@10, are 10.88%, 4.54%, and 3.82%, compared to the BoW models; and for the POS-MRF models, these rates are 13.59%, 8.20%, and 8.78%, compared to the MRF models. Additionally, we experimentally verify that the proposed weighting approach is superior to the simple heuristic and frequency based weighting approaches, and validate our POS category selection. Using the optimal weights calculated in this experiment, we tested the proposed models on the TREC 2004 Genomics track and obtained average of 8.63% and 10.04% improvement rates for POS-BoW and POS-MRF, respectively. These significant improvements verify the effectiveness of leveraging POS tagging for biomedical IR tasks. (C) 2016 Elsevier Inc. All rights reserved.
引用
收藏
页码:379 / 389
页数:11
相关论文
共 50 条
  • [21] A Novel Joint Entity Relation Extraction Based on Capsule Network and Part-of-Speech Weighting
    Wang, Jianmin
    Song, Yujia
    Zhao, Wenbin
    Jia, Ziyue
    Wu, Feng
    MOBILE INFORMATION SYSTEMS, 2022, 2022
  • [22] Part-of-Speech Tagger for Biomedical Domain Using Deep Neural Network Architecture
    Gopalakrishnan, Athira
    Soman, K. P.
    Premjith, B.
    2019 10TH INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION AND NETWORKING TECHNOLOGIES (ICCCNT), 2019,
  • [23] A Universal Part-of-Speech Tagset
    Petrov, Slav
    Das, Dipanjan
    McDonald, Ryan
    LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2012, : 2089 - 2096
  • [24] Part-of-speech tagging for Swedish
    Prütz, K
    PARALLEL CORPORA, PARALLEL WORLDS, 2002, (43): : 201 - 206
  • [25] Part-of-Speech Induction for Vietnamese
    Phuong Le-Hong
    Thi Minh Huyen Nguyen
    KNOWLEDGE AND SYSTEMS ENGINEERING (KSE 2013), VOL 2, 2014, 245 : 261 - 272
  • [26] PART-OF-SPEECH IMPLICATIONS OF AFFIXES
    EARL, LL
    MECHANICAL TRANSLATION, 1966, 9 (02): : 38 - &
  • [27] Part-of-speech studies in Chinese
    Wang, Lu
    JOURNAL OF QUANTITATIVE LINGUISTICS, 2016, 23 (03) : 235 - 255
  • [28] Graph-based term weighting for information retrieval
    Blanco, Roi
    Lioma, Christina
    INFORMATION RETRIEVAL, 2012, 15 (01): : 54 - 92
  • [29] Graph-based term weighting for information retrieval
    Roi Blanco
    Christina Lioma
    Information Retrieval, 2012, 15 : 54 - 92
  • [30] Orbit Weighting Scheme in the Context of Vector Space Information Retrieval
    Ababneh, Ahmad
    Sanjalawe, Yousef
    Fraihat, Salam
    Al-E'mari, Salam
    Alqudah, Hamzah
    CMC-COMPUTERS MATERIALS & CONTINUA, 2024, 80 (01): : 1347 - 1379