DISA tool: Discriminative and informative subspace assessment with categorical and numerical outcomes

被引:3
作者
Alexandre, Leonardo [1 ,2 ,3 ]
Costa, Rafael S. [3 ,4 ]
Henriques, Rui [1 ,2 ]
机构
[1] INESC ID, Lisbon, Portugal
[2] Univ Lisbon, Inst Super Tecn, Lisbon, Portugal
[3] Univ NOVA Lisboa, NOVA Sch Sci & Technol, Dept Chem, LAQV REQUIMTE, Caparica, Portugal
[4] Univ Lisbon, Inst Super Tecn, IDMEC, Lisbon, Portugal
关键词
D O I
10.1371/journal.pone.0276253
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Pattern discovery and subspace clustering play a central role in the biological domain, supporting for instance putative regulatory module discovery from omics data for both descriptive and predictive ends. In the presence of target variables (e.g. phenotypes), regulatory patterns should further satisfy delineate discriminative power properties, well-established in the presence of categorical outcomes, yet largely disregarded for numerical outcomes, such as risk profiles and quantitative phenotypes. DISA (Discriminative and Informative Subspace Assessment), a Python software package, is proposed to evaluate patterns in the presence of numerical outcomes using well-established measures together with a novel principle able to statistically assess the correlation gain of the subspace against the overall space. Results confirm the possibility to soundly extend discriminative criteria towards numerical outcomes without the drawbacks well-associated with discretization procedures. Results from four case studies confirm the validity and relevance of the proposed methods, further unveiling critical directions for research on biotechnology and biomedicine. Availability: DISA is freely available at https://github.com/JupitersMight/DISA under the MIT license.
引用
收藏
页数:19
相关论文
共 43 条
[31]   Alternative interest measures for mining associations in databases [J].
Omiecinski, ER .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2003, 15 (01) :57-69
[32]   Lessons from Two Design-Build-Test-Learn Cycles of Dodecanol Production in Escherichia coli Aided by Machine Learning [J].
Opgenort, Paul ;
Costello, Zak ;
Okada, Takuya ;
Goyal, Garima ;
Chen, Yan ;
Gin, Jennifer ;
Benites, Veronica ;
de Raad, Markus ;
Northen, Trent R. ;
Deng, Kai ;
Deutsch, Samuel ;
Baidoo, Edward E. K. ;
Petzold, Christopher J. ;
Hillson, Nathan J. ;
Martin, Hector Garcia ;
Beller, Harry R. .
ACS SYNTHETIC BIOLOGY, 2019, 8 (06) :1337-1351
[33]   A machine learning Automated Recommendation Tool for synthetic biology [J].
Radivojevic, Tijana ;
Costello, Zak ;
Workman, Kenneth ;
Martin, Hector Garcia .
NATURE COMMUNICATIONS, 2020, 11 (01)
[34]   Analyzing fibrous tissue pattern in fibrous dysplasia bone images using deep R-CNN networks for segmentation [J].
Saranya, A. ;
Kottursamy, Kottilingam ;
AlZubi, Ahmad Ali ;
Bashir, Ali Kashif .
SOFT COMPUTING, 2022, 26 (16) :7519-7533
[35]  
Shih MY, 2010, J APPL SCI ENG, V13, P11
[36]  
STREET WN, 1993, P SOC PHOTO-OPT INS, V1905, P861, DOI 10.1117/12.148698
[37]  
Tan P., 2002, P 8 ACM SIGKDD INT C, P32, DOI 10.1145/775047.775053
[38]   Selecting the right objective measure for association analysis [J].
Tan, PN ;
Kumar, V ;
Srivastava, J .
INFORMATION SYSTEMS, 2004, 29 (04) :293-313
[39]  
Tzung-Pei Hong, 1999, 1999 Third International Conference on Knowledge-Based Intelligent Information Engineering Systems. Proceedings (Cat. No.99TH8410), P480, DOI 10.1109/KES.1999.820227
[40]  
UCI Machine Learning Repository, 1990, LIV DIS