MAFin: motif detection in multiple alignment files

被引:0
|
作者
Patsakis, Michail [1 ,2 ]
Provatas, Kimonas [1 ,2 ,3 ]
Baltoumas, Fotis A. [4 ]
Chantzi, Nikol [1 ,2 ]
Mouratidis, Ioannis [1 ,2 ]
Pavlopoulos, Georgios A. [4 ]
Georgakopoulos-Soares, Ilias [1 ,2 ]
机构
[1] Penn State Univ, Coll Med, Inst Personalized Med, Dept Mol Biol & Pharmacol, 500 Univ Dr,C5716, Hershey, PA 17033 USA
[2] Penn State Univ, Huck Inst Life Sci, University Pk, PA 16802 USA
[3] Univ Crete, Div Basic Sci, Med Sch, Iraklion 71110, Greece
[4] BSRC Alexander Fleming, Inst Fundamental Biomed Res, Vari 16672, Greece
基金
美国国家卫生研究院;
关键词
ELEMENTS; VERTEBRATE; INSECT;
D O I
10.1093/bioinformatics/btaf125
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation Whole Genome and Proteome Alignments, represented by the multiple alignment file format, have become a standard approach in comparative genomics and proteomics. These often require identifying conserved motifs, which is crucial for understanding functional and evolutionary relationships. However, current approaches lack a direct method for motif detection within MAF files. We present MAFin, a novel tool that enables efficient motif detection and conservation analysis in MAF files to address this gap, streamlining genomic and proteomic research.Results We developed MAFin, the first motif detection tool for Multiple Alignment Format files. MAFin enables the multithreaded search of conserved motifs using three approaches: (i) using user-specified k-mers to search the sequences. (ii) with regular expressions, in which case one or more patterns are searched, and (iii) with predefined Position Weight Matrices. Once the motif has been found, MAFin detects the motif instances and calculates the conservation across the aligned sequences. MAFin also calculates a conservation percentage, which provides information about the conservation levels of each motif across the aligned sequences, based on the number of matches relative to the length of the motif. A set of statistics enables the interpretation of each motif's conservation level, and the detected motifs are exported in JSON and CSV files for downstream analyses.Availability and implementation MAFin is offered as a Python package under the GPL license as a multi-platform application and is available at: https://github.com/Georgakopoulos-Soares-lab/MAFin.
引用
收藏
页数:6
相关论文
共 50 条
  • [31] DNA motif alignment by evolving a population of Markov chains
    Chengpeng Bi
    BMC Bioinformatics, 10 (Suppl 1)
  • [32] Local graph alignment and motif search in biological networks
    Berg, J
    Lässig, M
    PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2004, 101 (41) : 14689 - 14694
  • [33] DNA motif alignment by evolving a population of Markov chains
    Bi, Chengpeng
    BMC BIOINFORMATICS, 2009, 10
  • [34] Using Catalytic Site Motif Alignment to Assign Function
    Dodge, Gregory James
    Bobo, Daniel Paul
    Bernstein, Herbert J.
    Craig, Paul A.
    FASEB JOURNAL, 2011, 25
  • [35] acc-Motif: Accelerated Network Motif Detection
    Meira, Luis A. A.
    Maximo, Vinicius R.
    Fazenda, Alvaro L.
    da Conceicao, Arlindo F.
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2014, 11 (05) : 853 - 862
  • [36] A Detailed View of KIR Haplotype Structures and Gene Families as Provided by a New Motif-Based Multiple Sequence Alignment
    Roe, David
    Vierra-Green, Cynthia
    Pyo, Chul-Woo
    Geraghty, Daniel E.
    Spellman, Stephen R.
    Maiers, Martin
    Kuang, Rui
    FRONTIERS IN IMMUNOLOGY, 2020, 11
  • [37] A simple method based on multiple alignment and phylogeny to derive a correlation between the protein fold and sequence via motif search
    Syed Baquer Rizvi
    Anil Kumar Shukla
    Vikash Kumar Dubey
    Interdisciplinary Sciences: Computational Life Sciences, 2009, 1 : 235 - 243
  • [38] A Simple Method Based on Multiple Alignment and Phylogeny to Derive a Correlation between the Protein Fold and Sequence via Motif Search
    Rizvi, Syed Baquer
    Shukla, Anil Kumar
    Dubey, Vikash Kumar
    INTERDISCIPLINARY SCIENCES-COMPUTATIONAL LIFE SCIENCES, 2009, 1 (03) : 235 - 243
  • [39] Evaluation of iterative alignment algorithms for multiple alignment
    Wallace, IM
    Orla, O
    Higgins, DG
    BIOINFORMATICS, 2005, 21 (08) : 1408 - 1414
  • [40] Tracy: basecalling, alignment, assembly and deconvolution of sanger chromatogram trace files
    Tobias Rausch
    Markus Hsi-Yang Fritz
    Andreas Untergasser
    Vladimir Benes
    BMC Genomics, 21