A Regularized Method for Selecting Nested Groups of Relevant Genes from Microarray Data

被引:37
作者
De Mol, Christine [3 ,4 ]
Mosci, Sofia [1 ,2 ]
Traskine, Magali [3 ]
Verri, Alessandro [1 ]
机构
[1] Univ Genoa, DISI, Genoa, Italy
[2] Univ Genoa, DIFI, Genoa, Italy
[3] Univ Libre Brussels, Dept Math, Brussels, Belgium
[4] Univ Libre Brussels, ECARES, Brussels, Belgium
关键词
gene expression; machine learning; recognition of genes and regulatory elements; EXPRESSION DATA; CLASSIFICATION; CANCER; REGRESSION; LASSO; BIOINFORMATICS;
D O I
10.1089/cmb.2008.0171
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Gene expression analysis aims at identifying the genes able to accurately predict biological parameters like, for example, disease subtyping or progression. While accurate prediction can be achieved by means of many different techniques, gene identification, due to gene correlation and the limited number of available samples, is a much more elusive problem. Small changes in the expression values often produce different gene lists, and solutions which are both sparse and stable are difficult to obtain. We propose a two-stage regularization method able to learn linear models characterized by a high prediction performance. By varying a suitable parameter these linear models allow to trade sparsity for the inclusion of correlated genes and to produce gene lists which are almost perfectly nested. Experimental results on synthetic and microarray data confirm the interesting properties of the proposed method and its potential as a starting point for further biological investigations.
引用
收藏
页码:677 / 690
页数:14
相关论文
共 30 条
[1]  
[Anonymous], Journal of machine learning research
[2]  
[Anonymous], TECHNOMETRICS
[3]   Convexity, classification, and risk bounds [J].
Bartlett, PL ;
Jordan, MI ;
McAuliffe, JD .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2006, 101 (473) :138-156
[4]  
BERTERO M., 1998, Introduction to Inverse Problems in Imaging, DOI [10.1201/9781003032755, DOI 10.1201/9781003032755]
[5]   Oncogenic pathway signatures in human cancers as a guide to targeted therapies [J].
Bild, AH ;
Yao, G ;
Chang, JT ;
Wang, QL ;
Potti, A ;
Chasse, D ;
Joshi, MB ;
Harpole, D ;
Lancaster, JM ;
Berchuck, A ;
Olson, JA ;
Marks, JR ;
Dressman, HK ;
West, M ;
Nevins, JR .
NATURE, 2006, 439 (7074) :353-357
[6]  
Breiman L., 1984, BIOMETRICS, V40, P874, DOI 10.1201/9781315139470
[7]   An iterative thresholding algorithm for linear inverse problems with a sparsity constraint [J].
Daubechies, I ;
Defrise, M ;
De Mol, C .
COMMUNICATIONS ON PURE AND APPLIED MATHEMATICS, 2004, 57 (11) :1413-1457
[8]   Elastic-net regularization in learning theory [J].
De Mol, Christine ;
De Vito, Ernesto ;
Rosasco, Lorenzo .
JOURNAL OF COMPLEXITY, 2009, 25 (02) :201-230
[9]   Comparison of discrimination methods for the classification of tumors using gene expression data [J].
Dudoit, S ;
Fridlyand, J ;
Speed, TP .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2002, 97 (457) :77-87
[10]  
Engl H. W., 1996, REGULARIZATION INVER, V375