DISA tool: Discriminative and informative subspace assessment with categorical and numerical outcomes

被引:3
作者
Alexandre, Leonardo [1 ,2 ,3 ]
Costa, Rafael S. [3 ,4 ]
Henriques, Rui [1 ,2 ]
机构
[1] INESC ID, Lisbon, Portugal
[2] Univ Lisbon, Inst Super Tecn, Lisbon, Portugal
[3] Univ NOVA Lisboa, NOVA Sch Sci & Technol, Dept Chem, LAQV REQUIMTE, Caparica, Portugal
[4] Univ Lisbon, Inst Super Tecn, IDMEC, Lisbon, Portugal
关键词
D O I
10.1371/journal.pone.0276253
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Pattern discovery and subspace clustering play a central role in the biological domain, supporting for instance putative regulatory module discovery from omics data for both descriptive and predictive ends. In the presence of target variables (e.g. phenotypes), regulatory patterns should further satisfy delineate discriminative power properties, well-established in the presence of categorical outcomes, yet largely disregarded for numerical outcomes, such as risk profiles and quantitative phenotypes. DISA (Discriminative and Informative Subspace Assessment), a Python software package, is proposed to evaluate patterns in the presence of numerical outcomes using well-established measures together with a novel principle able to statistically assess the correlation gain of the subspace against the overall space. Results confirm the possibility to soundly extend discriminative criteria towards numerical outcomes without the drawbacks well-associated with discretization procedures. Results from four case studies confirm the validity and relevance of the proposed methods, further unveiling critical directions for research on biotechnology and biomedicine. Availability: DISA is freely available at https://github.com/JupitersMight/DISA under the MIT license.
引用
收藏
页数:19
相关论文
共 43 条
[1]  
Aggarwal C. C., 2014, Frequent Pattern Mining, P443, DOI 10.1007/978-3-319-07821-2_18
[2]  
Agrawal R., 1993, SIGMOD Record, V22, P207, DOI 10.1145/170036.170072
[3]   Rough particle swarm optimization and its applications in data mining [J].
Alatas, Bilal ;
Akin, Erhan .
SOFT COMPUTING, 2008, 12 (12) :1205-1218
[4]   DI2: prior-free and multi-item discretization of biological data and its applications [J].
Alexandre, Leonardo ;
Costa, Rafael S. ;
Henriques, Rui .
BMC BIOINFORMATICS, 2021, 22 (01)
[5]   Mining Pre-Surgical Patterns Able to Discriminate Post-Surgical Outcomes in the Oncological Domain [J].
Alexandre, Leonardo ;
Costa, Rafael S. ;
Santos, Lucio Lara ;
Henriques, Rui .
IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2021, 25 (07) :2421-2434
[6]  
[Anonymous], 2002, P 6 ANN INT C COMP B
[7]   A statistical theory for quantitative association rules [J].
Aumann, Y ;
Lindell, Y .
JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2003, 20 (03) :255-283
[8]   Structure and evolution of transcriptional regulatory networks [J].
Babu, MM ;
Luscombe, NM ;
Aravind, L ;
Gerstein, M ;
Teichmann, SA .
CURRENT OPINION IN STRUCTURAL BIOLOGY, 2004, 14 (03) :283-291
[9]  
Brin S., 1997, P 1997 ACM SIGMOD IN, P265, DOI DOI 10.1145/253262.253327
[10]   Biclustering in data mining [J].
Busygin, Stanislav ;
Prokopyev, Oleg ;
Pardalos, Panos M. .
COMPUTERS & OPERATIONS RESEARCH, 2008, 35 (09) :2964-2987