Industry-scale application and evaluation of deep learning for drug target prediction

被引：29

作者：

Sturm, Noe ^{[1
]}

Mayr, Andreas ^{[2
,3
]}

Thanh Le Van ^{[4
]}

Chupakhin, Vladimir ^{[5
]}

Ceulemans, Hugo ^{[4
]}

Wegner, Joerg ^{[4
]}

Golib-Dzib, Jose-Felipe ^{[6
]}

Jeliazkova, Nina ^{[7
]}

Vandriessche, Yves ^{[8
]}

Bohm, Stanislav ^{[9
]}

Cima, Vojtech ^{[9
]}

Martinovic, Jan ^{[9
]}

Greene, Nigel ^{[1
]}

Vander Aa, Tom ^{[10
]}

Ashby, Thomas J. ^{[10
]}

Hochreiter, Sepp ^{[2
,3
]}

Engkvist, Ola ^{[11
]}

Klambauer, Guenter ^{[2
,3
]}

Chen, Hongming ^{[11
]}

机构：

[1] AstraZeneca, R&D Biopharmaceut, Clin Pharmacol & Safety Sci, Pepparedsleden 1, S-43183 Molndal, Sweden

[2] Johannes Kepler Univ Linz, LIT AI Lab, Altenberger Str 69, A-4040 Linz, Austria

[3] Johannes Kepler Univ Linz, Inst Machine Learning, Altenberger Str 69, A-4040 Linz, Austria

[4] Janssen Pharmaceut, High Dimens Biol & Discovery Data Sci Discovery S, Turnhoutseweg 30, B-2349 Beerse, Belgium

[5] Janssen R&D, High Dimens Biol & Discovery Data Sci Discovery S, 1400 McKean Rd, Spring House, PA 19002 USA

[6] Janssen Cilag SA, High Dimens Biol & Discovery Data Sci Discovery S, Calle Rio Jarama,75A, Toledo 45007, Spain

[7] Idea Consult Ltd, Veldkant 31, Sofia 1000, Bulgaria

[8] Intel Corp, Data Ctr Grp, Veldkant 31, B-2550 Kontich, Belgium

[9] VSB Tech Univ Ostrava, IT4Innovat, 17 Listopadu 2172-15, Ostrava 70800, Czech Republic

[10] IMEC, Exasci Lab, Kapeldreef 75, B-3001 Louvain, Belgium

[11] AstraZeneca, R&D Biopharmaceut, Hit Discovery Discovery Sci, Pepparedsleden 1, S-43183 Molndal, Sweden

来源：

JOURNAL OF CHEMINFORMATICS | 2020年 / 12卷 / 01期

基金：

欧盟地平线“2020”;

关键词：

QSAR; Deep learning; Machine learning; Structure-based virtual screening; Cheminformatics; Big data; ChEMBL; PubChem; Prospective evaluation; Retrospective evaluation; SIMILARITY; AGREEMENT;

D O I：

10.1186/s13321-020-00428-5

中图分类号：

O6 [化学];

学科分类号：

0703 ;

摘要：

Artificial intelligence (AI) is undergoing a revolution thanks to the breakthroughs of machine learning algorithms in computer vision, speech recognition, natural language processing and generative modelling. Recent works on publicly available pharmaceutical data showed that AI methods are highly promising for Drug Target prediction. However, the quality of public data might be different than that of industry data due to different labs reporting measurements, different measurement techniques, fewer samples and less diverse and specialized assays. As part of a European funded project (ExCAPE), that brought together expertise from pharmaceutical industry, machine learning, and high-performance computing, we investigated how well machine learning models obtained from public data can be transferred to internal pharmaceutical industry data. Our results show that machine learning models trained on public data can indeed maintain their predictive power to a large degree when applied to industry data. Moreover, we observed that deep learning derived machine learning models outperformed comparable models, which were trained by other machine learning algorithms, when applied to internal pharmaceutical company datasets. To our knowledge, this is the first large-scale study evaluating the potential of machine learning and especially deep learning directly at the level of industry-scale settings and moreover investigating the transferability of publicly learned target prediction models towards industrial bioactivity prediction pipelines.

引用

页数：13

共 54 条

[1]

[Anonymous], PARMA DITAM 2018 P 9

[2]

[Anonymous], 2008, MACH LEARN P 25 INT

[3]

[Anonymous], 2017, COMMUN ACM, DOI DOI 10.1145/3065386

[4]

[Anonymous], 2011, J MACHINE LEARNING T

[5]

[Anonymous], 2015, P INT C LEARN REP IC

[6]

[Anonymous], 2019, NUCLEIC ACIDS RES, DOI DOI 10.1093/NAR/GKY1033

[7]

[Anonymous], ARXIV190402514

[8]

Apweiler R., 2004, NUCLEIC ACIDS RES, V32, pD115, DOI [DOI 10.1093/nar/gkh131, 10.1093/nar/gkac1052]

[9] Inter-Coder Agreement for Computational Linguistics [J].

Artstein, Ron ;

Poesio, Massimo .

COMPUTATIONAL LINGUISTICS, 2008, 34 (04) :555-596

[10] Reliable estimation of prediction errors for QSAR models under model uncertainty using double cross-validation [J].

Baumann, Desiree ;

Baumann, Knut .

JOURNAL OF CHEMINFORMATICS, 2014, 6

← 1 2 3 4 5 6 →