mdCATH: A Large-Scale MD Dataset for Data-Driven Computational Biophysics

被引:1
|
作者
Mirarchi, Antonio [1 ]
Giorgino, Toni [2 ]
De Fabritiis, Gianni [1 ,3 ,4 ]
机构
[1] Univ Pompeu Fabra, Computat Sci Lab, Barcelona Biomed Res Pk PRBB, Carrer Dr Aiguader 88, Barcelona 08003, Spain
[2] Natl Res Council CNR, Biophys Inst IBF, Via Celoria 26, I-20133 Milan, Italy
[3] Institucio Catalana Recerca i Estudis Avancats ICR, Passeig Lluis Co 23, Barcelona 08010, Spain
[4] Acellera Labs, Doctor Trueta 183, Barcelona 08005, Spain
基金
美国国家卫生研究院;
关键词
MOLECULAR-DYNAMICS SIMULATIONS; CATH;
D O I
10.1038/s41597-024-04140-z
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Recent advancements in protein structure determination are revolutionizing our understanding of proteins. Still, a significant gap remains in the availability of comprehensive datasets that focus on the dynamics of proteins, which are crucial for understanding protein function, folding, and interactions. To address this critical gap, we introduce mdCATH, a dataset generated through an extensive set of all-atom molecular dynamics simulations of a diverse and representative collection of protein domains. This dataset comprises all-atom systems for 5,398 domains, modeled with a state-of-the-art classical force field, and simulated in five replicates each at five temperatures from 320 K to 450 K. The mdCATH dataset records coordinates and forces every 1 ns, for over 62 ms of accumulated simulation time, effectively capturing the dynamics of the various classes of domains and providing a unique resource for proteome-wide statistical analyses of protein unfolding thermodynamics and kinetics. We outline the dataset structure and showcase its potential through four easily reproducible case studies, highlighting its capabilities in advancing protein science.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] Personal workspace for large-scale data-driven computational experiment
    Sun, Yiming
    Jensen, Scott
    Pallickara, Sangmi Lee
    Plale, Beth
    2006 7TH IEEE/ACM INTERNATIONAL CONFERENCE ON GRID COMPUTING, 2006, : 112 - +
  • [2] PGD: A Large-scale Professional Go Dataset for Data-driven Analytics
    Gao, Yifan
    2022 IEEE CONFERENCE ON GAMES, COG, 2022, : 284 - 291
  • [3] PGD: A Large-scale Professional Go Dataset for Data-driven Analytics
    Gao, Yifan
    arXiv, 2022,
  • [4] Data-Driven Crowd Understanding: A Baseline for a Large-Scale Crowd Dataset
    Zhang, Cong
    Kang, Kai
    Li, Hongsheng
    Wang, Xiaogang
    Xie, Rong
    Yang, Xiaokang
    IEEE TRANSACTIONS ON MULTIMEDIA, 2016, 18 (06) : 1048 - 1061
  • [5] WaterBench-Iowa: a large-scale benchmark dataset for data-driven streamflow forecasting
    Demir, Ibrahim
    Xiang, Zhongrun
    Demiray, Bekir
    Sit, Muhammed
    EARTH SYSTEM SCIENCE DATA, 2022, 14 (12) : 5605 - 5616
  • [6] A Data-driven Mechanism for Large-scale Data Distribution
    Shi Peichang
    Li Yiying
    Ding Bo
    Jiang Longquan
    Liu Hui
    Zhang Jie
    2016 WORLD AUTOMATION CONGRESS (WAC), 2016,
  • [7] Data-driven Authoring of Large-scale Ecosystems
    Kapp, Konrad
    Gain, James
    Guerin, Eric
    Galin, Eric
    Peytavie, Adrien
    ACM TRANSACTIONS ON GRAPHICS, 2020, 39 (06):
  • [8] A computational method for the load spectra of large-scale structures with a data-driven learning algorithm
    XianJia Chen
    Zheng Yuan
    Qiang Li
    ShouGuang Sun
    YuJie Wei
    Science China Technological Sciences, 2023, 66 : 141 - 154
  • [9] A computational method for the load spectra of large-scale structures with a data-driven learning algorithm
    Chen, XianJia
    Yuan, Zheng
    Li, Qiang
    Sun, ShouGuang
    Wei, YuJie
    SCIENCE CHINA-TECHNOLOGICAL SCIENCES, 2023, 66 (01) : 141 - 154
  • [10] A computational method for the load spectra of large-scale structures with a data-driven learning algorithm
    CHEN XianJia
    YUAN Zheng
    LI Qiang
    SUN ShouGuang
    WEI YuJie
    Science China(Technological Sciences), 2023, 66 (01) : 141 - 154