A Framework for Medical Text Mining using a Feature Weighted Clustering Algorithm

被引:0
作者
Chakrabarty, Anirban [1 ]
Roy, Santanu [1 ]
机构
[1] Future Inst Engn & Management, Dept MCA, Kolkata, India
来源
2013 1ST INTERNATIONAL CONFERENCE ON EMERGING TRENDS AND APPLICATIONS IN COMPUTER SCIENCE (ICETACS) | 2013年
关键词
Text categorization; minimum spanning tree; clustering; cosine based similarity; weight factor; liver disease;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Text categorization is the task of deciding whether a document belongs to a set of pre specified classes of documents. Categorization of documents is challenging, as the number of discriminating words can be huge. Many existing text classification algorithms simply do not work with these many number of words. Traditional text classification algorithm uses all training samples for classification, thereby increasing the storage requirements and calculation complexity as the number of features increase. Mining medical records for relationships between living factors and the symptoms of a disease is an important task, however there has been relatively little research into this area. The proposed work evolves a text classification algorithm where all cluster centers are taken as training samples there by reducing the sample size and introduces a weight factor to indicate the different importance of each training sample. A similarity measure function is used to classify a new patient document, based on the measure. Experiments on real life data show that the proposed algorithm outperforms the state of art classification algorithms such as k-nearest neighbor.
引用
收藏
页码:135 / 139
页数:5
相关论文
共 16 条
[1]  
ARANGANAYAGI S, 2007, ICCIMA 2007 INT C, P13, DOI DOI 10.1109/ICCIMA
[2]  
Asano T., 1988, Proceedings of the Fourth Annual Symposium on Computational Geometry, P252, DOI 10.1145/73393.73419
[3]   Fever detection from free-text clinical records for biosurveillance [J].
Chapman, WW ;
Dowling, JN ;
Wagner, MM .
JOURNAL OF BIOMEDICAL INFORMATICS, 2004, 37 (02) :120-127
[4]  
LEE LW, 2006, NEW METHODS TEXT CAT
[5]  
Li Ying, 2004, Mini-Micro Systems, V25, P993
[6]  
Mamlin Burke W, 2003, AMIA Annu Symp Proc, P420
[7]   Information retrieval and knowledge discovery utilizing a biomedical patent Semantic Web [J].
Mukherjea, S ;
Bamba, B ;
Kankar, P .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2005, 17 (08) :1099-1110
[8]  
Pakhomov S., J AM MED
[9]  
Peter S. J., 2010, INT J COMPUTER APPL, V8, P0975
[10]  
Ranjani R., 2012, INT J COMPUTER APPL, V45, P41