Feasibility of Active Machine Learning for Multiclass Compound Classification

被引:30
作者
Lang, Tobias [1 ,2 ]
Flachsenberg, Florian [1 ]
von Luxburg, Ulrike [3 ]
Rarey, Matthias [1 ]
机构
[1] Univ Hamburg, Ctr Bioinformat, D-20146 Hamburg, Germany
[2] Univ Hamburg, Dept Comp Sci, Schluterstr 70, D-20146 Hamburg, Germany
[3] Univ Tubingen, Dept Comp Sci, D-72076 Tubingen, Germany
关键词
DISCOVERY; TOOL;
D O I
10.1021/acs.jcim.5b00332
中图分类号
R914 [药物化学];
学科分类号
100701 ;
摘要
A common task in the hit-to-lead process is classifying sets of compounds into multiple, usually structural classes, which build the groundwork for subsequent SAR studies. Machine learning techniques can be used to automate this process by learning classification models from training compounds of each class. Gathering class information for compounds can be cost-intensive as the required data needs to be provided by human experts or experiments. This paper studies whether active machine learning can be used to reduce the required number of training compounds. Active learning is a machine learning method which processes class label data in an iterative fashion. It has gained much attention in a broad range of application areas. In this paper, an active learning method for multiclass compound classification is proposed. This method selects informative training compounds so as to optimally support the learning progress. The combination with human feedback leads to a semiautomated interactive multiclass classification procedure. This method was investigated empirically on 15 compound classification tasks containing 86-2870 compounds in 3-38 classes. The empirical results show that active learning can solve these classification tasks using 10-80% of the data which would be necessary for standard learning techniques.
引用
收藏
页码:12 / 20
页数:9
相关论文
共 50 条
  • [31] Ensemble Learning with Active Example Selection for Imbalanced Biomedical Data Classification
    Oh, Sangyoon
    Lee, Min Su
    Zhang, Byoung-Tak
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2011, 8 (02) : 316 - 325
  • [32] Evaluation of the performance of various machine learning methods on the discrimination of the active compounds
    Shamsara, Jamal
    CHEMICAL BIOLOGY & DRUG DESIGN, 2021, 97 (04) : 930 - 943
  • [33] CRISPRcasIdentifier: Machine learning for accurate identification and classification of CRISPR-Cas systems
    Padilha, Victor A.
    Alkhnbashi, Omer S.
    Shah, Shiraz A.
    de Carvalho, Andre C. P. L. F.
    Backofen, Rolf
    GIGASCIENCE, 2020, 9 (06):
  • [34] A machine learning-based classification model to identify the effectiveness of vibration for μEDM
    Mollik, Md Shohag
    Saleh, Tanveer
    Nor, Khairul Affendy Bin Md
    Ali, Mohamed Sultan Mohamed
    ALEXANDRIA ENGINEERING JOURNAL, 2022, 61 (09) : 6979 - 6989
  • [35] Iterative ensemble feature selection for multiclass classification of imbalanced microarray data
    Yang, Junshan
    Zhou, Jiarui
    Zhu, Zexuan
    Ma, Xiaoliang
    Ji, Zhen
    JOURNAL OF BIOLOGICAL RESEARCH-THESSALONIKI, 2016, 23
  • [36] PUGSVM: a caBIG™ analytical tool for multiclass gene selection and predictive classification
    Yu, Guoqiang
    Li, Huai
    Ha, Sook
    Shih, Ie-Ming
    Clarke, Robert
    Hoffman, Eric P.
    Madhavan, Subha
    Xuan, Jianhua
    Wang, Yue
    BIOINFORMATICS, 2011, 27 (05) : 736 - 738
  • [37] ProPythia: A Python']Python package for protein classification based on machine and deep learning
    Sequeira, Ana Marta
    Lousa, Diana
    Rocha, Miguel
    NEUROCOMPUTING, 2022, 484 : 172 - 182
  • [38] Detection of dispersed radio pulses: a machine learning approach to candidate identification and classification
    Devine, Thomas Ryan
    Goseva-Popstojanova, Katerina
    McLaughlin, Maura
    MONTHLY NOTICES OF THE ROYAL ASTRONOMICAL SOCIETY, 2016, 459 (02) : 1519 - 1532
  • [39] A genetic programming-based approach to the classification of multiclass microarray datasets
    Liu, Kun-Hong
    Xu, Chun-Gui
    BIOINFORMATICS, 2009, 25 (03) : 331 - 337
  • [40] Interactive labelling of a multivariate dataset for supervised machine learning using linked visualisations, clustering, and active learning
    Chegini, Mohammad
    Bernard, Juergen
    Berger, Philip
    Sourin, Alexei
    Andrews, Keith
    Schreck, Tobias
    VISUAL INFORMATICS, 2019, 3 (01) : 9 - 17