Semi-supervised incremental learning with few examples for discovering medical association rules

被引:3
作者
Sanchez-de-Madariaga, Ricardo [1 ,4 ]
Martinez-Romo, Juan [2 ,4 ]
Cantero Escribano, Jose Miguel [3 ]
Araujo, Lourdes [2 ,4 ]
机构
[1] Inst Salud Carlos III, Telemed & eHlth Res Unit, Monforte de Lemos 5, Madrid 28029, Spain
[2] Univ Nacl Educ Distancia, Nat Language Processings & Informat Retrieval Grp, Madrid 28040, Spain
[3] Hosp Univ La Paz Carlos III Cantoblanco, Prevent Med Serv, Madrid 28046, Spain
[4] IMIENS, Inst Mixto UNED ISCIII, Madrid 28029, Spain
关键词
Medical records; Association rules discovery; Machine learning; Semi-supervised approach; ALGORITHM;
D O I
10.1186/s12911-022-01755-3
中图分类号
R-058 [];
学科分类号
摘要
Background Association Rules are one of the main ways to represent structural patterns underlying raw data. They represent dependencies between sets of observations contained in the data. The associations established by these rules are very useful in the medical domain, for example in the predictive health field. Classic algorithms for association rule mining give rise to huge amounts of possible rules that should be filtered in order to select those most likely to be true. Most of the proposed techniques for these tasks are unsupervised. However, the accuracy provided by unsupervised systems is limited. Conversely, resorting to annotated data for training supervised systems is expensive and time-consuming. The purpose of this research is to design a new semi-supervised algorithm that performs like supervised algorithms but uses an affordable amount of training data. Methods In this work we propose a new semi-supervised data mining model that combines unsupervised techniques (Fisher's exact test) with limited supervision. Starting with a small seed of annotated data, the model improves results (F-measure) obtained, using a fully supervised system (standard supervised ML algorithms). The idea is based on utilising the agreement between the predictions of the supervised system and those of the unsupervised techniques in a series of iterative steps. Results The new semi-supervised ML algorithm improves the results of supervised algorithms computed using the F-measure in the task of mining medical association rules, but training with an affordable amount of manually annotated data. Conclusions Using a small amount of annotated data (which is easily achievable) leads to results similar to those of a supervised system. The proposal may be an important step for the practical development of techniques for mining association rules and generating new valuable scientific medical knowledge.
引用
收藏
页数:11
相关论文
共 43 条
[1]  
Agrawal R., 1993, SIGMOD Record, V22, P207, DOI 10.1145/170036.170072
[2]  
[Anonymous], 2002, 11 OOPSLA WORKSH BEH
[3]  
Blessy RN, 2014, INT J ADV RES COMPUT, V3, P7376
[4]  
Blum A., 1998, Proceedings of the Eleventh Annual Conference on Computational Learning Theory, P92, DOI 10.1145/279943.279962
[5]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[6]  
Bross I.D., 1971, Foundations of statistical inference, P500
[7]   Efficient mining of association rules for the early diagnosis of Alzheimer's disease [J].
Chaves, R. ;
Gorriz, J. M. ;
Ramirez, J. ;
Illan, I. A. ;
Salas-Gonzalez, D. ;
Gomez-Rio, M. .
PHYSICS IN MEDICINE AND BIOLOGY, 2011, 56 (18) :6047-6063
[8]  
Chen H, 2017, J EPIDEMIOL PUBLIC H, DOI [10.16966/2471-.8211.157, DOI 10.16966/2471-.8211.157]
[9]   Image Mining using Association Rule for Medical Image dataset [J].
Deshmukh, Jyoti ;
Bhosle, Udhav .
INTERNATIONAL CONFERENCE ON COMPUTATIONAL MODELLING AND SECURITY (CMS 2016), 2016, 85 :117-124
[10]  
Djenouri Y, 2013, WOR CONG NAT BIOL, P120, DOI 10.1109/NaBIC.2013.6617849