ThermoScan: Semi-automatic Identification of Protein Stability Data From PubMed

被引:5
|
作者
Turina, Paola [1 ]
Fariselli, Piero [2 ]
Capriotti, Emidio [1 ]
机构
[1] Univ Bologna, Dept Pharm & Biotechnol FaBiT, Bologna, Italy
[2] Univ Torino, Dept Med Sci, Turin, Italy
关键词
protein stability; text mining; document classification; automated literature mining; thermodynamic data; EXTRACTION;
D O I
10.3389/fmolb.2021.620475
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
During the last years, the increasing number of DNA sequencing and protein mutagenesis studies has generated a large amount of variation data published in the biomedical literature. The collection of such data has been essential for the development and assessment of tools predicting the impact of protein variants at functional and structural levels. Nevertheless, the collection of manually curated data from literature is a highly time consuming and costly process that requires domain experts. In particular, the development of methods for predicting the effect of amino acid variants on protein stability relies on the thermodynamic data extracted from literature. In the past, such data were deposited in the ProTherm database, which however is no longer maintained since 2013. For facilitating the collection of protein thermodynamic data from literature, we developed the semi-automatic tool ThermoScan. ThermoScan is a text mining approach for the identification of relevant thermodynamic data on protein stability from full-text articles. The method relies on a regular expression searching for groups of words, including the most common conceptual words appearing in experimental studies on protein stability, several thermodynamic variables, and their units of measure. ThermoScan analyzes full-text articles from the PubMed Central Open Access subset and calculates an empiric score that allows the identification of manuscripts reporting thermodynamic data on protein stability. The method was optimized on a set of publications included in the ProTherm database, and tested on a new curated set of articles, manually selected for presence of thermodynamic data. The results show that ThermoScan returns accurate predictions and outperforms recently developed text-mining algorithms based on the analysis of publication abstracts.
引用
收藏
页数:7
相关论文
共 23 条
  • [1] Semi-automatic Quality Control of Topographic Data Sets
    Helmholz, Petra
    Becker, Christian
    Breitkopf, Uwe
    Bueschenfeld, Torsten
    Busch, Andreas
    Braun, Carola
    Gruenreich, Dietmar
    Mueller, Soenke
    Ostermann, Joern
    Pahl, Martin
    Rottensteiner, Franz
    Vogt, Karsten
    Ziems, Marcel
    Heipke, Christian
    PHOTOGRAMMETRIC ENGINEERING AND REMOTE SENSING, 2012, 78 (09): : 959 - 972
  • [2] Semi-Automatic Detection of Swimming Pools from Aerial High-Resolution Images and LIDAR Data
    Rodriguez-Cuenca, Borja
    Alonso, Maria C.
    REMOTE SENSING, 2014, 6 (04) : 2628 - 2646
  • [3] A Semi-Automatic Approach to Construct Vietnamese Ontology from Online Text
    Bao-An Nguyen
    Yang, Don-Lin
    INTERNATIONAL REVIEW OF RESEARCH IN OPEN AND DISTANCE LEARNING, 2012, 13 (05) : 148 - 172
  • [4] Semi-automatic Green ICT Ontology Construction from CSR Report
    Soiraya, Banatus
    2012 7TH INTERNATIONAL CONFERENCE ON COMPUTING AND CONVERGENCE TECHNOLOGY (ICCCT2012), 2012, : 711 - 714
  • [5] Rapid, semi-automatic fracture and contact mapping for point clouds, images and geophysical data
    Thiele, Samuel T.
    Grose, Lachlan
    Samsu, Anindita
    Micklethwaite, Steven
    Vollgger, Stefan A.
    Cruden, Alexander R.
    SOLID EARTH, 2017, 8 (06) : 1241 - 1253
  • [6] Semi-Automatic Construction of Thyroid Cancer Intervention Corpus from Biomedical Abstracts
    Kongburan, Wutthipong
    Padungweang, Praisan
    Krathu, Worarat
    Chan, Jonathan H.
    2016 EIGHTH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTATIONAL INTELLIGENCE (ICACI), 2016, : 150 - 157
  • [7] Semi-automatic Generation of Active Ontologies from Web Forms for Intelligent Assistants
    Blersch, Martin
    Landhaeusser, Mathias
    Mayer, Thomas
    2018 IEEE/ACM 6TH INTERNATIONAL WORKSHOP ON REALIZING ARTIFICIAL INTELLIGENCE SYNERGIES IN SOFTWARE ENGINEERING (RAISE), 2018, : 28 - 34
  • [8] Semi-automatic detection of linear archaeological traces from orthorectified aerial images
    Figorito, Benedetto
    Tarantino, Eufemia
    INTERNATIONAL JOURNAL OF APPLIED EARTH OBSERVATION AND GEOINFORMATION, 2014, 26 : 458 - 463
  • [9] Large scale semi-automatic detection of forest roads from low density LiDAR data on steep terrain in Northern Spain
    Prendes, Covadonga
    Bujan, Sandra
    Ordonez, Celestino
    Canga, Elena
    IFOREST-BIOGEOSCIENCES AND FORESTRY, 2019, 12 : 366 - 374
  • [10] Semi-automatic measurement for rock mass discontinuity orientation, trace and spacing from point clouds
    Cao, Bei
    Zhu, Xudong
    Lin, Zishan
    Li, Yani
    Yang, Zicheng
    Lu, Guangyin
    MEASUREMENT, 2025, 246