T-HMM: A Novel Biomedical Text Classifier Based on Hidden Markov Models

被引:8
作者
Seara Vieira, A. [1 ]
Iglesias, E. L. [1 ]
Borrajo, L. [1 ]
机构
[1] Univ Vigo, Escola Super Enxeneria Informat, Dept Comp Sci, Orense, Spain
来源
8TH INTERNATIONAL CONFERENCE ON PRACTICAL APPLICATIONS OF COMPUTATIONAL BIOLOGY & BIOINFORMATICS (PACBB 2014) | 2014年 / 294卷
关键词
Hidden Markov Model; Text classification; Bioinformatics; DOCUMENTS;
D O I
10.1007/978-3-319-07581-5_27
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
In this paper, we propose an original model for the classification of biomedical texts stored in large document corpora. The model classifies scientific documents according to their content using information retrieval techniques and Hidden Markov Models. To demonstrate the efficiency of the model, we present a set of experiments which have been performed on OHSUMED biomedical corpus, a subset of the MEDLINE database, and the Allele and GO TREC corpora. Our classifier is also compared with Naive Bayes, k-NN and SVM techniques. Experiments illustrate the effectiveness of the proposed approach. Results show that the model is comparable to the SVM technique in the classification of biomedical texts.
引用
收藏
页码:225 / 234
页数:10
相关论文
共 18 条
[1]  
[Anonymous], 2005, MORGAN KAUFMANN SERI
[2]  
Araujo B., 2006, APRENDIZAJE AUTOMATI
[3]  
Baeza-Yates R, 1999, MODERN INFORM RETRIE, V463
[4]   COMBINING TEXT CLASSIFIERS AND HIDDEN MARKOV MODELS FOR INFORMATION EXTRACTION [J].
Barros, Flavia A. ;
Silva, Eduardo F. A. ;
Prudencio, Ricardo B. C. ;
Filho, Valmir M. ;
Nascimento, Andre C. A. .
INTERNATIONAL JOURNAL ON ARTIFICIAL INTELLIGENCE TOOLS, 2009, 18 (02) :311-329
[5]   Hidden markov models for text categorization in multi-page documents [J].
Frasconi, P ;
Soda, G ;
Vullo, A .
JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2002, 18 (2-3) :195-217
[6]  
Freitag D., 1999, A A A I Workshop on Machine Learning for Information Extraction, P31
[7]  
Hersh W., 1994, SIGIR '94. Proceedings of the Seventeenth Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval, P192
[8]  
Hersh W., 2005, TREC 2005 Notebook, V500-266, P14
[9]   An HMM-based over-sampling technique to improve text classification [J].
Iglesias, E. L. ;
Seara Vieira, A. ;
Borrajo, L. .
EXPERT SYSTEMS WITH APPLICATIONS, 2013, 40 (18) :7184-7192
[10]  
Janecek A., 2008, Journal of Machine Learning Research: Workshop and Conference Proceedings, P90, DOI DOI 10.1023/A:1010933404324