IDSL.UFA Assigns High-Confidence Molecular Formula Annotations for Untargeted LC/HRMS Data Sets in Metabolomics and Exposomics

被引:4
作者
Baygi, Sadjad Fakouri [1 ]
Banerjee, Sanjay K. [2 ,3 ]
Chakraborty, Praloy [2 ]
Kumar, Yashwant [2 ]
Barupal, Dinesh Kumar [1 ]
机构
[1] Icahn Sch Med Mt Sinai, Dept Environm Med & Publ Hlth, New York, NY 10029 USA
[2] Translat Hlth Sci & Technol Inst, Noncommunicable Dis Div, Faridabad 121001, Haryana, India
[3] Natl Inst Pharmaceut Educ & Res, Dept Biotechnol, Gauhati 781101, Assam, India
关键词
QUANTIFICATION; DISCOVERY;
D O I
10.1021/acs.analchem.2c00563
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
Untargeted liquid chromatography/high-resolution mass spectrometry (LC/HRMS) assays in metabolomics and exposomics aim to characterize the small molecule chemical space in a biospecimen. To gain maximum biological insights from these data sets, LC/HRMS peaks should be annotated with chemical and functional information including molecular formula, structure, chemical class, and metabolic pathways. Among these, molecular formulas may be assigned to LC/HRMS peaks through matching theoretical and observed isotopic profiles (MS1) of the underlying ionized compound. For this, we have developed the Integrated Data Science Laboratory for Metabolomics and Exposomics-United Formula Annotation (IDSL.UFA) R package. In the untargeted metabolomics validation tests, IDSL.UFA assigned 54.31-85.51% molecular formula for true positive annotations as the top hit and 90.58-100% within the top five hits. Molecular formula annotations were also supported by tandem mass spectrometry data. We have implemented new strategies to (1) generate formula sources and their theoretical isotopic profiles, (2) optimize the formula hits ranking for the individual and aligned peak lists, and (3) scale IDSL.UFA-based workflows for studies with larger sample sizes. Annotating the raw data for a publicly available pregnancy metabolome study using IDSL.UFA highlighted hundreds of new pregnancy-related compounds and also suggested the presence of chlorinated perfluorotriether alcohols (Cl-PFTrEAs) in human specimens. IDSL.UFA is useful for human metabolomics and exposomics studies where we need to minimize the loss of biological insights in untargeted LC/HRMS data sets. The IDSL.UFA package is available in the R CRAN repository https://cran.r-project. org/package=IDSL.UFA. Detailed documentation and tutorials are also provided at www.ufa.idsl.me.
引用
收藏
页码:13315 / 13322
页数:8
相关论文
共 43 条
  • [1] A multi-omic analysis of birthweight in newborn cord blood reveals new underlying mechanisms related to cholesterol metabolism
    Alfano, Rossella
    Chadeau-Hyam, Marc
    Ghantous, Akram
    Keski-Rahkonen, Pekka
    Chatzi, Leda
    Perez, Almudena Espin
    Herceg, Zdenko
    Kogevinas, Manolis
    de Kok, Theo M.
    Nawrot, Tim S.
    Novoloaca, Alexei
    Patel, Chirag J.
    Pizzi, Costanza
    Robinot, Nivonirina
    Rusconi, Franca
    Scalbert, Augustin
    Sunyer, Jordi
    Vermeulen, Roel
    Vrijheid, Martine
    Vineis, Paolo
    Robinson, Oliver
    Plusquin, Michelle
    [J]. METABOLISM-CLINICAL AND EXPERIMENTAL, 2020, 110
  • [2] [Anonymous], FDA STRUCTURED PRODU
  • [3] A Comprehensive Plasma Metabolomics Dataset for a Cohort of Mouse Knockouts within the International Mouse Phenotyping Consortium
    Barupal, Dinesh K.
    Zhang, Ying
    Shen, Tong
    Fan, Sili
    Roberts, Bryan S.
    Fitzgerald, Patrick
    Wancewicz, Benjamin
    Valdiviez, Luis
    Wohlgemuth, Gert
    Byram, Gregory
    Choy, Ying Yng
    Haffner, Bennett
    Showalter, Megan R.
    Vaniya, Arpana
    Bloszies, Clayton S.
    Folz, Jacob S.
    Kind, Tobias
    Flenniken, Ann M.
    McKerlie, Colin
    Nutter, Lauryl M. J.
    Lloyd, Kent C.
    Fiehn, Oliver
    [J]. METABOLITES, 2019, 9 (05):
  • [4] Generating the Blood Exposome Database Usinga Comprehensive Text Mining and Database Fusion Approach
    Barupal, Dinesh Kumar
    Fiehn, Oliver
    [J]. ENVIRONMENTAL HEALTH PERSPECTIVES, 2019, 127 (09)
  • [5] IDSL.IPA Characterizes the Organic Chemical Space in Untargeted LC/HRMS Data Sets
    Baygi, Sadjad Fakouri
    Kumar, Yashwant
    Barupal, Dinesh Kumar
    [J]. JOURNAL OF PROTEOME RESEARCH, 2022, 21 (06) : 1485 - 1494
  • [6] Nontargeted Discovery of Novel Contaminants in the Great Lakes Region: A Comparison of Fish Fillets and Fish Consumers
    Baygi, Sadjad Fakouri
    Fernando, Sujan
    Hopke, Philip K.
    Holsen, Thomas M.
    Crimmins, Bernard S.
    [J]. ENVIRONMENTAL SCIENCE & TECHNOLOGY, 2021, 55 (06) : 3765 - 3774
  • [7] Decadal Differences in Emerging Halogenated Contaminant Profiles in Great Lakes Top Predator Fish
    Baygi, Sadjad Fakouri
    Fernando, Sujan
    Hopke, Philip K.
    Holsen, Thomas M.
    Crimmins, Bernard S.
    [J]. ENVIRONMENTAL SCIENCE & TECHNOLOGY, 2020, 54 (22) : 14352 - 14360
  • [8] Automated Isotopic Profile Deconvolution for High Resolution Mass Spectrometric Data (APGC-QToF) from Biological Matrices
    Baygi, Sadjad Fakouri
    Fernando, Sujan
    Hopke, Philip K.
    Holsen, Thomas M.
    Crimmins, Bernard S.
    [J]. ANALYTICAL CHEMISTRY, 2019, 91 (24) : 15509 - 15517
  • [9] Comprehensive Emerging Chemical Discovery: Novel Polyfluorinated Compounds in Lake Michigan Trout
    Baygi, Sadjad Fakouri
    Crimmins, Bernard S.
    Hopke, Philip K.
    Holsen, Thomas M.
    [J]. ENVIRONMENTAL SCIENCE & TECHNOLOGY, 2016, 50 (17) : 9460 - 9468
  • [10] A cross-platform toolkit for mass spectrometry and proteomics
    Chambers, Matthew C.
    Maclean, Brendan
    Burke, Robert
    Amodei, Dario
    Ruderman, Daniel L.
    Neumann, Steffen
    Gatto, Laurent
    Fischer, Bernd
    Pratt, Brian
    Egertson, Jarrett
    Hoff, Katherine
    Kessner, Darren
    Tasman, Natalie
    Shulman, Nicholas
    Frewen, Barbara
    Baker, Tahmina A.
    Brusniak, Mi-Youn
    Paulse, Christopher
    Creasy, David
    Flashner, Lisa
    Kani, Kian
    Moulding, Chris
    Seymour, Sean L.
    Nuwaysir, Lydia M.
    Lefebvre, Brent
    Kuhlmann, Frank
    Roark, Joe
    Rainer, Paape
    Detlev, Suckau
    Hemenway, Tina
    Huhmer, Andreas
    Langridge, James
    Connolly, Brian
    Chadick, Trey
    Holly, Krisztina
    Eckels, Josh
    Deutsch, Eric W.
    Moritz, Robert L.
    Katz, Jonathan E.
    Agus, David B.
    MacCoss, Michael
    Tabb, David L.
    Mallick, Parag
    [J]. NATURE BIOTECHNOLOGY, 2012, 30 (10) : 918 - 920