Data Mining Approach for Extraction of Useful Information About Biologically Active Compounds from Publications

被引:14
作者
Tarasova, Olga A. [1 ]
Biziukova, Nadezhda Yu [1 ]
Filimonov, Dmitry A. [1 ]
Poroikov, Vladimir V. [1 ]
Nicklaus, Marc C. [2 ]
机构
[1] Inst Biomed Chem, Dept Bioinformat, 10 Bldg 8,Pogodinskaya St, Moscow 119121, Russia
[2] NCI, Comp Aided Drug Design Grp, Chem Biol Lab, Ctr Canc Res, Frederick, MD 21702 USA
基金
俄罗斯基础研究基金会;
关键词
DRUG; GENE;
D O I
10.1021/acs.jcim.9b00164
中图分类号
R914 [药物化学];
学科分类号
100701 ;
摘要
A lot of high quality data on the biological activity of chemical compounds are required throughout the whole drug discovery process: from development of computational models of the structure-activity relationship to experimental testing of lead compounds and their validation in clinics. Currently, a large amount of such data is available from databases, scientific publications, and patents. Biological data are characterized by incompleteness, uncertainty, and low reproducibility. Despite the existence of free and commercially available databases of biological activities of compounds, they usually lack unambiguous information about peculiarities of biological assays. On the other hand, scientific papers are the primary source of new data disclosed to the scientific community for the first time. In this study, we have developed and validated a data-mining approach for extraction of text fragments containing description of bioassays. We have used this approach to evaluate compounds and their biological activity reported in scientific publications. We have found that categorization of papers into relevant and irrelevant may be performed based on the machine-learning analysis of the abstracts. Text fragments extracted from the full texts of publications allow their further partitioning into several classes according to the peculiarities of bioassays. We demonstrate the applicability of our approach to the comparison of the endpoint values of biological activity and cytotoxicity of reference compounds.
引用
收藏
页码:3635 / 3644
页数:10
相关论文
共 42 条
[1]  
Agarwala R, 2016, NUCLEIC ACIDS RES, V44, pD7, DOI [10.1093/nar/gkv1290, 10.1093/nar/gku1130]
[2]   A Multiple siRNA-Based Anti-HIV/SHIV Microbicide Shows Protection in Both In Vitro and In Vivo Models [J].
Boyapalle, Sandhya ;
Xu, Weidong ;
Raulji, Payal ;
Mohapatra, Subhra ;
Mohapatra, Shyam S. .
PLOS ONE, 2015, 10 (09)
[3]   Chemotext: A Publicly Available Web Server for Mining Drug-Target-Disease Relationships in PubMed [J].
Capuzzi, Stephen J. ;
Thornton, Thomas E. ;
Liu, Kammy ;
Baker, Nancy ;
Lam, Wai In ;
O'Banion, Colin P. ;
Muratov, Eugene N. ;
Pozefsky, Diane ;
Tropsha, Alexander .
JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2018, 58 (02) :212-218
[4]  
Carpenter B., 2007, P 2 BIOCREATIVE WORK
[5]   BioAssay templates for the semantic web [J].
Clark, Alex M. ;
Litterman, Nadia K. ;
Kranz, Janice E. ;
Gund, Peter ;
Gregory, Kellan ;
Bunin, Barry A. .
PEERJ COMPUTER SCIENCE, 2016, 2016 (05)
[6]   Fast and accurate semantic annotation of bioassays exploiting a hybrid of machine learning and user confirmation [J].
Clark, Alex M. ;
Bunin, Barry A. ;
Litterman, Nadia K. ;
Schuerer, Stephan C. ;
Visser, Ubbo .
PEERJ, 2014, 2
[7]   How Consistent are Publicly Reported Cytotoxicity Data? Large-Scale Statistical Analysis of the Concordance of Public Independent Cytotoxicity Measurements [J].
Cortes-Ciriano, Isidro ;
Bender, Andreas .
CHEMMEDCHEM, 2016, 11 (01) :57-71
[8]   NCBI disease corpus: A resource for disease name recognition and concept normalization [J].
Dogan, Rezarta Islamaj ;
Leaman, Robert ;
Lu, Zhiyong .
JOURNAL OF BIOMEDICAL INFORMATICS, 2014, 47 :1-10
[9]   Advancing Science through Mining Libraries, Ontologies, and Communities [J].
Evans, James A. ;
Rzhetsky, Andrey .
JOURNAL OF BIOLOGICAL CHEMISTRY, 2011, 286 (27) :23659-23666
[10]   Exploring the boundaries: gene and protein identification in biomedical text [J].
Finkel, J ;
Dingare, S ;
Manning, CD ;
Nissim, M ;
Alex, B ;
Grover, C .
BMC BIOINFORMATICS, 2005, 6 (Suppl 1)