MAGPEL: an autoMated pipeline for inferring vAriant-driven Gene PanEls from the full-length biomedical literature

被引:4
作者
Saberian, Nafiseh [1 ]
Shafi, Adib [1 ]
Peyvandipour, Azam [1 ]
Draghici, Sorin [1 ,2 ]
机构
[1] Wayne State Univ, Dept Comp Sci, Detroit, MI 48202 USA
[2] Wayne State Univ, Dept Obstet & Gynecol, Detroit, MI 48202 USA
基金
美国国家科学基金会;
关键词
ACUTE MYELOID-LEUKEMIA; BREAST-CANCER; ESR1; MUTATIONS; POINT MUTATIONS; PIK3CA GENE; EXTRACTION; PROTEIN; DATABASE; BRCA2; KNOWLEDGEBASE;
D O I
10.1038/s41598-020-68649-0
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
In spite of the efforts in developing and maintaining accurate variant databases, a large number of disease-associated variants are still hidden in the biomedical literature. Curation of the biomedical literature in an effort to extract this information is a challenging task due to: (i) the complexity of natural language processing, (ii) inconsistent use of standard recommendations for variant description, and (iii) the lack of clarity and consistency in describing the variant-genotype-phenotype associations in the biomedical literature. In this article, we employ text mining and word cloud analysis techniques to address these challenges. The proposed framework extracts the variant-gene-disease associations from the full-length biomedical literature and designs an evidence-based variant-driven gene panel for a given condition. We validate the identified genes by showing their diagnostic abilities to predict the patients' clinical outcome on several independent validation cohorts. As representative examples, we present our results for acute myeloid leukemia (AML), breast cancer and prostate cancer. We compare these panels with other variant-driven gene panels obtained from Clinvar, Mastermind and others from literature, as well as with a panel identified with a classical differentially expressed genes (DEGs) approach. The results show that the panels obtained by the proposed framework yield better results than the other gene panels currently available in the literature.
引用
收藏
页数:11
相关论文
共 79 条
  • [1] An approach to infer putative disease-specific mechanisms using neighboring gene networks
    Ansari, Sahar
    Donato, Michele
    Saberian, Nafiseh
    Draghici, Sorin
    [J]. BIOINFORMATICS, 2017, 33 (13) : 1987 - 1994
  • [2] Antoniou AC, 2014, NEW ENGL J MED, V371, P497, DOI [10.1056/NEJMoa1400382, 10.1056/NEJMc1410673, 10.1056/NEJMc1410673#SA1]
  • [3] The PIK3CA gene is mutated with high frequency in human breast cancers
    Bachman, KE
    Argani, P
    Samuels, Y
    Silliman, N
    Ptak, J
    Szabo, S
    Konishi, H
    Karakas, B
    Blair, BG
    Lin, C
    Peters, BA
    Velculescu, VE
    Park, BH
    [J]. CANCER BIOLOGY & THERAPY, 2004, 3 (08) : 772 - 775
  • [4] Mutation mining - A prospector's tale
    Baker, CJO
    Witte, R
    [J]. INFORMATION SYSTEMS FRONTIERS, 2006, 8 (01) : 47 - 57
  • [5] Balk S. P., 2008, NUCL RECEPTOR SIGNAL, V6
  • [6] Exome sequencing identifies recurrent SPOP, FOXA1 and MED12 mutations in prostate cancer
    Barbieri, Christopher E.
    Baca, Sylvan C.
    Lawrence, Michael S.
    Demichelis, Francesca
    Blattner, Mirjam
    Theurillat, Jean-Philippe
    White, Thomas A.
    Stojanov, Petar
    Van Allen, Eliezer
    Stransky, Nicolas
    Nickerson, Elizabeth
    Chae, Sung-Suk
    Boysen, Gunther
    Auclair, Daniel
    Onofrio, Robert C.
    Park, Kyung
    Kitabayashi, Naoki
    MacDonald, Theresa Y.
    Sheikh, Karen
    Vuong, Terry
    Guiducci, Candace
    Cibulskis, Kristian
    Sivachenko, Andrey
    Carter, Scott L.
    Saksena, Gordon
    Voet, Douglas
    Hussain, Wasay M.
    Ramos, Alex H.
    Winckler, Wendy
    Redman, Michelle C.
    Ardlie, Kristin
    Tewari, Ashutosh K.
    Mosquera, Juan Miguel
    Rupp, Niels
    Wild, Peter J.
    Moch, Holger
    Morrissey, Colm
    Nelson, Peter S.
    Kantoff, Philip W.
    Gabriel, Stacey B.
    Golub, Todd R.
    Meyerson, Matthew
    Lander, Eric S.
    Getz, Gad
    Rubin, Mark A.
    Garraway, Levi A.
    [J]. NATURE GENETICS, 2012, 44 (06) : 685 - U107
  • [7] Barrett T, 2005, NUCLEIC ACIDS RES, V33, pD562
  • [8] Béroud C, 2000, HUM MUTAT, V15, P86, DOI 10.1002/(SICI)1098-1004(200001)15:1<86::AID-HUMU16>3.0.CO
  • [9] 2-4
  • [10] The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003
    Boeckmann, B
    Bairoch, A
    Apweiler, R
    Blatter, MC
    Estreicher, A
    Gasteiger, E
    Martin, MJ
    Michoud, K
    O'Donovan, C
    Phan, I
    Pilbout, S
    Schneider, M
    [J]. NUCLEIC ACIDS RESEARCH, 2003, 31 (01) : 365 - 370