T-HMM: A Novel Biomedical Text Classifier Based on Hidden Markov Models

被引:8
作者
Seara Vieira, A. [1 ]
Iglesias, E. L. [1 ]
Borrajo, L. [1 ]
机构
[1] Univ Vigo, Escola Super Enxeneria Informat, Dept Comp Sci, Orense, Spain
来源
8TH INTERNATIONAL CONFERENCE ON PRACTICAL APPLICATIONS OF COMPUTATIONAL BIOLOGY & BIOINFORMATICS (PACBB 2014) | 2014年 / 294卷
关键词
Hidden Markov Model; Text classification; Bioinformatics; DOCUMENTS;
D O I
10.1007/978-3-319-07581-5_27
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
In this paper, we propose an original model for the classification of biomedical texts stored in large document corpora. The model classifies scientific documents according to their content using information retrieval techniques and Hidden Markov Models. To demonstrate the efficiency of the model, we present a set of experiments which have been performed on OHSUMED biomedical corpus, a subset of the MEDLINE database, and the Allele and GO TREC corpora. Our classifier is also compared with Naive Bayes, k-NN and SVM techniques. Experiments illustrate the effectiveness of the proposed approach. Results show that the model is comparable to the SVM technique in the classification of biomedical texts.
引用
收藏
页码:225 / 234
页数:10
相关论文
共 18 条
[11]  
Leek T.R., 1997, THESIS UC SAN DIEGO
[12]  
Li K., 2011, INT J DIGITAL CONTEN, V5, P244
[13]   A hidden Markov model information retrieval system [J].
Miller, DRH ;
Leek, T ;
Schwartz, RM .
SIGIR'99: PROCEEDINGS OF 22ND INTERNATIONAL CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 1999, :214-221
[14]  
Nikolaos Tsimboukakis, 2008, Proceedings of the 5th International Conference on Soft Computing as Transdisciplinary Science and Technology 2008. In Memory of Professor Yasuhiko Dote, P7, DOI 10.1145/1456223.1456229
[15]  
RABINER LR, 1990, READINGS SPEECH RECO, P267
[16]  
Sebastiani F, 2005, ADV MANAG INFORM, V2, P109
[17]  
Viera AJ, 2005, FAM MED, V37, P360
[18]   A hidden Markov model-based text classification of medical documents [J].
Yi, Kwan ;
Beheshti, Jamshid .
JOURNAL OF INFORMATION SCIENCE, 2009, 35 (01) :67-81