The phrase-based vector space model for automatic retrieval of free-text medical documents

被引:36
|
作者
Mao, Wenlei [1 ]
Chu, Wesley W. [1 ]
机构
[1] Univ Calif Los Angeles, Dept Comp Sci, Los Angeles, CA 90095 USA
关键词
information storage and retrieval/methods; computing methodologies; vector space model; concept-based vector space model; phrase-based vector space model; information systems; unified medical language system;
D O I
10.1016/j.datak.2006.02.008
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Objective: To develop a document indexing scheme that improves the retrieval effectiveness for free-text medical documents. Design: The phrase-based vector space model (VSM) uses multi-word phrases as indexing terms. Each phrase consists of a concept in the unified medical language system (UMLS) and its corresponding component word stems. The similarity between concepts are defined by their relations in a hypernym hierarchy derived from UMLS. After defining the similarity between two phrases by their stem overlaps and the similarity between the concepts they represent, we define the similarity between two documents as the cosine of the angle between their corresponding phrase vectors. This paper reports the development and the validation of the phrase-based VSM. Measurement: We compare the retrieval effectiveness of different vector space models using two standard test collections, OHSUMED and Medlars. OHSUMED contains 105 queries and 14,430 documents, and Medlars contains 30 queries and 1033 documents. Each document in the test collections is judged by human experts to be either relevant or non-relevant to each query. The retrieval effectiveness is measured by precision and recall. Results: The phrase-based VSM is significantly more effective than the current gold standard-the stem-based VSM. Such significant retrieval effectiveness improvements are observed in both the exhaustive search and cluster-based document retrievals. Conclusion: The phrase-based VSM is a better indexing scheme than the stem-based VSM. Medical document retrieval using the phrase-based VSM is significantly more effective than that using the stem-based VSM. (c) 2006 Elsevier B.V. All rights reserved.
引用
收藏
页码:76 / 92
页数:17
相关论文
共 50 条
  • [41] Text classification model based on semantic pattern vector space
    Wang, Xiaoyue
    Hu, Zewen
    Li, Yuping
    Journal of Information and Computational Science, 2010, 7 (11): : 2302 - 2311
  • [42] Knowledge-based vector space model for text clustering
    Liping Jing
    Michael K. Ng
    Joshua Z. Huang
    Knowledge and Information Systems, 2010, 25 : 35 - 55
  • [43] A Vector Space Model based Education Resources Automatic Classifier
    Xia, Tian
    2014 SECOND INTERNATIONAL CONFERENCE ON ENTERPRISE SYSTEMS (ES), 2014, : 323 - 326
  • [44] A CONNECTIONIST MODEL FOR INFORMATION-RETRIEVAL BASED ON THE VECTOR-SPACE MODEL
    CROUCH, CJ
    CROUCH, DB
    NAREDDY, K
    INTERNATIONAL JOURNAL OF EXPERT SYSTEMS, 1994, 7 (02): : 139 - 163
  • [45] Assessing the Ability of a Large Language Model to Score Free-Text Medical Student Clinical Notes: Quantitative Study
    Burke, Harry B.
    Hoang, Albert
    Lopreiato, Joseph
    King, Heidi
    Hemmer, Paul
    Montgomery, Michael
    Gagarin, Viktoria
    JMIR MEDICAL EDUCATION, 2024, 10
  • [46] An Extended Vector Space Model for Content-Based Image Retrieval
    Berber, Tolga
    Alpkocak, Adil
    MULTILINGUAL INFORMATION ACCESS EVALUATION II: MULTIMEDIA EXPERIMENTS, PT II, 2010, 6242 : 219 - 222
  • [47] An information retrieval model based on vector space method by supervised learning
    Tai, XY
    Ren, F
    Kita, K
    INFORMATION PROCESSING & MANAGEMENT, 2002, 38 (06) : 749 - 764
  • [48] Automatic Text Summarization: A New Hybrid Model Based on Vector Space Modelling, Fuzzy Logic and Rhetorical Structure Analysis
    Ben Ayed, Alaidine
    Biskri, Ismail
    Meunier, Jean-Guy
    COMPUTATIONAL COLLECTIVE INTELLIGENCE, PT II, 2019, 11684 : 26 - 34
  • [49] Research on Ontology-Based Text Representation of Vector Space Model
    Wei, Guiying
    Bao, Mingming
    Wu, Sen
    2010 2ND INTERNATIONAL WORKSHOP ON DATABASE TECHNOLOGY AND APPLICATIONS PROCEEDINGS (DBTA), 2010,
  • [50] Sentence alignment for web page text based on vector space model
    Zhang, Guan-Hong
    Odbal
    International Journal of Digital Content Technology and its Applications, 2012, 6 (17) : 144 - 153