Machine Learning-Based Ensemble Recursive Feature Selection of Circulating miRNAs for Cancer Tumor Classification

被引:43
作者
Lopez-Rincon, Alejandro [1 ]
Mendoza-Maldonado, Lucero [2 ]
Martinez-Archundia, Marlet [3 ]
Schonhuth, Alexander [4 ,5 ]
Kraneveld, Aletta D. [1 ]
Garssen, Johan [1 ,6 ]
Tonda, Alberto [7 ]
机构
[1] Univ Utrecht, Fac Sci, Utrecht Inst Pharmaceut Sci, Div Pharmacol, Univ Weg 99, NL-3584 CG Utrecht, Netherlands
[2] Nuevo Hosp Civil Guadalajara Dr Juan I Menchaca, Salvador Quevedo & Zubieta 750, Guadalajara 44340, Jalisco, Mexico
[3] Inst Politecn Nacl, Lab Modelado Mol Bioinformat & Diseno Farmacos, Escuela Super Med, Secc Estudios Posgrad & Invest, Mexico City 11340, DF, Mexico
[4] Ctr Wiskunde & Informat, Life Sci & Hlth, Sci Pk 123, NL-1098 XG Amsterdam, Netherlands
[5] Bielefeld Univ, Fac Technol, Genome Data Sci, Univ Str 25, D-33615 Bielefeld, Germany
[6] Global Ctr Excellence Immunol Danone Nutricia Res, Uppsalaan 12, NL-3584 CT Utrecht, Netherlands
[7] Univ Paris Saclay, INRAE, UMR 518 MIA Paris, F-75013 Paris, France
关键词
miRNAs; TNBC; machine learning; feature selection; circulating; ESTROGEN-RECEPTOR-ALPHA; TAMOXIFEN RESISTANCE; LUNG ADENOCARCINOMA; NONCODING RNAS; POOR-PROGNOSIS; CELL-GROWTH; EXPRESSION; DIAGNOSIS; PROGRESSION; ASSOCIATION;
D O I
10.3390/cancers12071785
中图分类号
R73 [肿瘤学];
学科分类号
100214 ;
摘要
Circulating microRNAs (miRNA) are small noncoding RNA molecules that can be detected in bodily fluids without the need for major invasive procedures on patients. miRNAs have shown great promise as biomarkers for tumors to both assess their presence and to predict their type and subtype. Recently, thanks to the availability of miRNAs datasets, machine learning techniques have been successfully applied to tumor classification. The results, however, are difficult to assess and interpret by medical experts because the algorithms exploit information from thousands of miRNAs. In this work, we propose a novel technique that aims at reducing the necessary information to the smallest possible set of circulating miRNAs. The dimensionality reduction achieved reflects a very important first step in a potential, clinically actionable, circulating miRNA-based precision medicine pipeline. While it is currently under discussion whether this first step can be taken, we demonstrate here that it is possible to perform classification tasks by exploiting a recursive feature elimination procedure that integrates a heterogeneous ensemble of high-quality, state-of-the-art classifiers on circulating miRNAs. Heterogeneous ensembles can compensate inherent biases of classifiers by using different classification algorithms. Selecting features then further eliminates biases emerging from using data from different studies or batches, yielding more robust and reliable outcomes. The proposed approach is first tested on a tumor classification problem in order to separate 10 different types of cancer, with samples collected over 10 different clinical trials, and later is assessed on a cancer subtype classification task, with the aim to distinguish triple negative breast cancer from other subtypes of breast cancer. Overall, the presented methodology proves to be effective and compares favorably to other state-of-the-art feature selection methods.
引用
收藏
页码:1 / 27
页数:26
相关论文
共 104 条
[31]   miRBase: tools for microRNA genomics [J].
Griffiths-Jones, Sam ;
Saini, Harpreet Kaur ;
van Dongen, Stijn ;
Enright, Anton J. .
NUCLEIC ACIDS RESEARCH, 2008, 36 :D154-D158
[32]  
Grossi Ilaria, 2018, Microrna, V7, P156, DOI 10.2174/2211536607666180629155025
[33]   The Drosha-DGCR8 complex in primary microRNA processing [J].
Han, JJ ;
Lee, Y ;
Yeom, KH ;
Kim, YK ;
Jin, H ;
Kim, VN .
GENES & DEVELOPMENT, 2004, 18 (24) :3016-3027
[34]   miR-342 is associated with estrogen receptor-α expression and response to tamoxifen in breast cancer [J].
He, Yue-Jun ;
Wu, Jian-Zhong ;
Ji, Ming-Hua ;
Ma, Tao ;
Qiao, En-Qi ;
Ma, Rong ;
Tang, Jin-Hai .
EXPERIMENTAL AND THERAPEUTIC MEDICINE, 2013, 5 (03) :813-818
[35]   Current State of Circulating MicroRNAs as Cancer Biomarkers [J].
He, Yuqing ;
Lin, Juanjuan ;
Kong, Danli ;
Huang, Mingyuan ;
Xu, Chengkai ;
Kim, Taek-Kyun ;
Etheridge, Alton ;
Luo, Yanhong ;
Ding, Yuanlin ;
Wang, Kai .
CLINICAL CHEMISTRY, 2015, 61 (09) :1138-1155
[36]   A 4-miRNA signature to predict survival in glioblastomas [J].
Hermansen, Simon K. ;
Sorensen, Mia D. ;
Hansen, Anker ;
Knudsen, Steen ;
Alvarado, Alvaro G. ;
Lathia, Justin D. ;
Kristensen, Bjarne W. .
PLOS ONE, 2017, 12 (11)
[37]   MicroRNA machinery genes as novel biomarkers for cancer [J].
Huang, Jing-Tao ;
Wang, Jin ;
Srivastava, Vibhuti ;
Sent, Subrata ;
Liu, Song-Mei .
FRONTIERS IN ONCOLOGY, 2014, 4
[38]   Integrated genomic analysis of recurrence-associated small non-coding RNAs in oesophageal cancer [J].
Jang, Hee-Jin ;
Lee, Hyun-Sung ;
Burt, Bryan M. ;
Lee, Geon Kook ;
Yoon, Kyong-Ah ;
Park, Yun-Yong ;
Sohn, Bo Hwa ;
Kim, Sang Bae ;
Kim, Moon Soo ;
Lee, Jong Mog ;
Joo, Jungnam ;
Kim, Sang Cheol ;
Yun, Ju Sik ;
Na, Kook Joo ;
Choi, Yoon-La ;
Park, Jong-Lyul ;
Kim, Seon-Young ;
Lee, Yong Sun ;
Han, Leng ;
Liang, Han ;
Mak, Duncan ;
Burks, Jared K. ;
Zo, Jae Ill ;
Sugarbaker, David J. ;
Shim, Young Mog ;
Lee, Ju-Seog .
GUT, 2017, 66 (02) :215-225
[39]   Increased miR-708 Expression in NSCLC and Its Association with Poor Survival in Lung Adenocarcinoma from Never Smokers [J].
Jang, Jin Sung ;
Jeon, Hyo-Sung ;
Sun, Zhifu ;
Aubry, Marie Christine ;
Tang, Hui ;
Park, Cheol-Hong ;
Rakhshan, Fariborz ;
Schultz, Debra A. ;
Kolbert, Christopher P. ;
Lupu, Ruth ;
Park, Jae Yong ;
Harris, Curtis C. ;
Yang, Ping ;
Jen, Jin .
CLINICAL CANCER RESEARCH, 2012, 18 (13) :3658-3667
[40]   Early metastatic colorectal cancers show increased tissue expression of miR-17/92 cluster members in the invasive tumor front [J].
Jepsen, Rikke Karlin ;
Novotny, Guy Wayne ;
Klarskov, Louise Laurberg ;
Bang-Berthelsen, Claus Heiner ;
Haakansson, Ida Trondhjem ;
Hansen, Anker ;
Christensen, Ib Jade ;
Riis, Lene Buhl ;
Hogdall, Estrid .
HUMAN PATHOLOGY, 2018, 80 :231-238