mdCATH: A Large-Scale MD Dataset for Data-Driven Computational Biophysics

被引:1
|
作者
Mirarchi, Antonio [1 ]
Giorgino, Toni [2 ]
De Fabritiis, Gianni [1 ,3 ,4 ]
机构
[1] Univ Pompeu Fabra, Computat Sci Lab, Barcelona Biomed Res Pk PRBB, Carrer Dr Aiguader 88, Barcelona 08003, Spain
[2] Natl Res Council CNR, Biophys Inst IBF, Via Celoria 26, I-20133 Milan, Italy
[3] Institucio Catalana Recerca i Estudis Avancats ICR, Passeig Lluis Co 23, Barcelona 08010, Spain
[4] Acellera Labs, Doctor Trueta 183, Barcelona 08005, Spain
基金
美国国家卫生研究院;
关键词
MOLECULAR-DYNAMICS SIMULATIONS; CATH;
D O I
10.1038/s41597-024-04140-z
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Recent advancements in protein structure determination are revolutionizing our understanding of proteins. Still, a significant gap remains in the availability of comprehensive datasets that focus on the dynamics of proteins, which are crucial for understanding protein function, folding, and interactions. To address this critical gap, we introduce mdCATH, a dataset generated through an extensive set of all-atom molecular dynamics simulations of a diverse and representative collection of protein domains. This dataset comprises all-atom systems for 5,398 domains, modeled with a state-of-the-art classical force field, and simulated in five replicates each at five temperatures from 320 K to 450 K. The mdCATH dataset records coordinates and forces every 1 ns, for over 62 ms of accumulated simulation time, effectively capturing the dynamics of the various classes of domains and providing a unique resource for proteome-wide statistical analyses of protein unfolding thermodynamics and kinetics. We outline the dataset structure and showcase its potential through four easily reproducible case studies, highlighting its capabilities in advancing protein science.
引用
收藏
页数:11
相关论文
共 50 条
  • [31] A Data-Driven Based Approach for Islanding Detection in Large-Scale Power Systems
    Golpira, Hemin
    IEEE TRANSACTIONS ON POWER SYSTEMS, 2025, 40 (01) : 272 - 285
  • [32] Data-Driven Robust and Sparse Solutions for Large-scale Fuzzy Portfolio Optimization
    Yu, Na
    Liang, You
    Thavaneswaran, A.
    2021 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (IEEE SSCI 2021), 2021,
  • [33] Data-driven online distributed disturbance location for large-scale power grids
    Yang, Zekun
    Chen, Yu
    Zhou, Ning
    Polunchenko, Aleksey
    Liu, Yilu
    IET SMART GRID, 2019, 2 (03) : 381 - 390
  • [34] A Data-Driven Krylov Model Order Reduction for Large-Scale Dynamical Systems
    M. A. Hamadi
    K. Jbilou
    A. Ratnani
    Journal of Scientific Computing, 2023, 95
  • [35] Data-Driven Lightweight Interest Point Selection for Large-Scale Visual Search
    Gao, Feng
    Zhang, Xinfeng
    Huang, Yicheng
    Luo, Yong
    Li, Xiaoming
    Duan, Ling-Yu
    IEEE TRANSACTIONS ON MULTIMEDIA, 2018, 20 (10) : 2774 - 2787
  • [36] Large-Scale Experiments on Data-Driven Design of Commercial Spoken Dialog Systems
    Suendermann, D.
    Liscombe, J.
    Bloom, J.
    Li, G.
    Pieraccini, R.
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 820 - 823
  • [37] Data-driven framework for large-scale prediction of charging energy in electric vehicles
    Zhao, Yang
    Wang, Zhenpo
    Shen, Zuo-Jun Max
    Sun, Fengchun
    APPLIED ENERGY, 2021, 282
  • [38] Innovative data-driven algorithm for defect parameter identification in large-scale structures
    Jiang, Shouyan
    Deng, Wangtao
    Zhang, Peng
    Hu, Guofu
    Du, Chengbin
    APPLIED MATHEMATICAL MODELLING, 2025, 141
  • [39] Large-scale transfer learning for data-driven modelling of hot water systems
    Kazmi, Hussain
    Suykens, Johan
    Driesen, Johan
    PROCEEDINGS OF BUILDING SIMULATION 2019: 16TH CONFERENCE OF IBPSA, 2020, : 2611 - 2618
  • [40] Data-Driven Reservoir Simulation in a Large-Scale Hydrological and Water Resource Model
    Turner, Sean W. D.
    Doering, Kenji
    Voisin, Nathalie
    WATER RESOURCES RESEARCH, 2020, 56 (10)