Predicting the encoding of secondary diagnoses. An experience based on decision trees

被引:1
作者
Chahbandarian G. [1 ]
Bricon-Souf N. [1 ]
Megdiche I. [1 ]
Bastide R. [1 ]
Steinbach J.-C. [2 ]
机构
[1] IRIT/ISIS, Université de Toulouse, CNRS, Castres
[2] Centre Hospitalier Intercommunal de Castres Mazamet, Department of Medical Information, Castres
来源
Ingenierie des Systemes d'Information | 2017年 / 22卷 / 02期
关键词
Coding ICD-10; Data mining; Decision tree; Machine learning; PMSI; Secondary diagnoses;
D O I
10.3166/ISI.22.2.69-94
中图分类号
学科分类号
摘要
In order to measure the medical activity, hospitals are required to manually encode diagnoses concerning an inpatient episode using the International Classification of Disease (ICD-10). This task is time consuming and requires substantial training for the staff. In this paper, we are proposing an approach able to speed up and facilitate the tedious manual task of coding patient information, especially while coding some secondary diagnoses that are not well described in the medical resources such as discharge letters and medical records. Our approach leverages data mining techniques, and specifically decision trees, in order to explore medical databases that encode such diagnoses knowledge. It uses the stored structured information (age, gender, diagnoses count, medical procedures, etc.) to build a decision tree which assigns the appropriate secondary diagnosis code into the corresponding inpatient episode. We have evaluated our approach on the PMSI database using fine and coarse levels of diagnoses granularity. Three types of experimentations have been performed using different techniques to balance datasets. The results show a significant variation in the evaluation scores between the different techniques for the same studied diagnoses. We highlight the efficiency of the random sampling techniques regardless of the type of diagnoses and the type of measure (F1-measure, recall and precision). © 2017 Lavoisier.
引用
收藏
页码:69 / 94
页数:25
相关论文
共 45 条
[1]  
Angiulli F., Fast Condensed Nearest Neighbor Rule, In Proceedings of the 22Nd International Conference on Machine Learning, pp. 25-32, (2005)
[2]  
Aronson A.R., Bodenreider O., Demner-Fushman D., Fung K.W., Lee V.K., Mork J.G., Et al., From Indexing the Biomedical Literature to Coding Clinical Text: Experience with MTI and Machine Learning Approaches, pp. 105-112, (2007)
[3]  
Breiman L., Friedman J., Olshen R.A., Stone C., Classification and Regression Trees, (1984)
[4]  
Busse R., Geissler A., Quentin W., Diagnosis-Related Groups in Europe: Moving Towards Transparency, Efficiency and Quality in Hospitals, (2011)
[5]  
Chahbandarian G., Souf N., Bastide R., Steinbach J.-C., Increasing Alertness while Coding Secondary Diagnostics in the Medical Record, In Proceedings of the 9Th International Joint Conference on Biomedical Engineering Systems and Technologies, pp. 490-495, (2016)
[6]  
Chawla N.V., Data Mining for Imbalanced Datasets: An Overview, In Data Mining and Knowledge Discovery Handbook, pp. 853-867, (2005)
[7]  
Chawla N.V., Bowyer K.W., Hall L.O., Kegelmeyer W.P., Smote: Synthetic minority over-sampling technique, Journal of Artificial Intelligence Research, 16, pp. 321-357, (2002)
[8]  
Chrysos G., Dagritzikos P., Papaefstathiou I., Dollas A., HC-CART: A parallel system implementation of data mining classification and regression tree (CART) algorithm on a multi-FPGA system, ACM Transactions on Architecture and Code Optimization, 9, 4, pp. 1-25, (2013)
[9]  
Cieslak D.A., Chawla N.V., Learning Decision Trees for Unbalanced Data, Inmachine Learning and Knowledge Discovery in Databases, 5211, pp. 241-256, (2008)
[10]  
Collobert R., Weston J., A unified architecture for natural language processing, In Proceedings of the 25Th International Conference on Machine Learning- Icml ’08, pp. 160-167, (2008)