MAFin: motif detection in multiple alignment files

被引:0
|
作者
Patsakis, Michail [1 ,2 ]
Provatas, Kimonas [1 ,2 ,3 ]
Baltoumas, Fotis A. [4 ]
Chantzi, Nikol [1 ,2 ]
Mouratidis, Ioannis [1 ,2 ]
Pavlopoulos, Georgios A. [4 ]
Georgakopoulos-Soares, Ilias [1 ,2 ]
机构
[1] Penn State Univ, Coll Med, Inst Personalized Med, Dept Mol Biol & Pharmacol, 500 Univ Dr,C5716, Hershey, PA 17033 USA
[2] Penn State Univ, Huck Inst Life Sci, University Pk, PA 16802 USA
[3] Univ Crete, Div Basic Sci, Med Sch, Iraklion 71110, Greece
[4] BSRC Alexander Fleming, Inst Fundamental Biomed Res, Vari 16672, Greece
基金
美国国家卫生研究院;
关键词
ELEMENTS; VERTEBRATE; INSECT;
D O I
10.1093/bioinformatics/btaf125
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation Whole Genome and Proteome Alignments, represented by the multiple alignment file format, have become a standard approach in comparative genomics and proteomics. These often require identifying conserved motifs, which is crucial for understanding functional and evolutionary relationships. However, current approaches lack a direct method for motif detection within MAF files. We present MAFin, a novel tool that enables efficient motif detection and conservation analysis in MAF files to address this gap, streamlining genomic and proteomic research.Results We developed MAFin, the first motif detection tool for Multiple Alignment Format files. MAFin enables the multithreaded search of conserved motifs using three approaches: (i) using user-specified k-mers to search the sequences. (ii) with regular expressions, in which case one or more patterns are searched, and (iii) with predefined Position Weight Matrices. Once the motif has been found, MAFin detects the motif instances and calculates the conservation across the aligned sequences. MAFin also calculates a conservation percentage, which provides information about the conservation levels of each motif across the aligned sequences, based on the number of matches relative to the length of the motif. A set of statistics enables the interpretation of each motif's conservation level, and the detected motifs are exported in JSON and CSV files for downstream analyses.Availability and implementation MAFin is offered as a Python package under the GPL license as a multi-platform application and is available at: https://github.com/Georgakopoulos-Soares-lab/MAFin.
引用
收藏
页数:6
相关论文
共 50 条
  • [41] Detection of Spyware by Mining Executable Files
    Shazhad, Raja Khurram
    Haider, Syed Imran
    Lavesson, Niklas
    FIFTH INTERNATIONAL CONFERENCE ON AVAILABILITY, RELIABILITY, AND SECURITY: ARES 2010, PROCEEDINGS, 2010, : 295 - 302
  • [42] Tamper detection marking for object files
    Jochen, M
    Pollock, LL
    Marvel, LM
    MILCOM 2003 - 2003 IEEE MILITARY COMMUNICATIONS CONFERENCE, VOLS 1 AND 2, 2003, : 747 - 751
  • [43] Detection of the Music or Video Files in BitTorrent
    Zhou Zhiqiang
    Yoshiura, Noriaki
    THEORY AND PRACTICE OF COMPUTATION, 2012, 5 : 202 - 213
  • [44] Honeyfiles: Deceptive files for intrusion detection
    Yuill, J
    Zappe, M
    Denning, D
    Feer, F
    PROCEEDINGS FROM THE FIFTH IEEE SYSTEMS, MAN AND CYBERNETICS INFORMATION ASSURANCE WORKSHOP, 2004, : 116 - 122
  • [45] Tracy: basecalling, alignment, assembly and deconvolution of sanger chromatogram trace files
    Rausch, Tobias
    Fritz, Markus Hsi-Yang
    Untergasser, Andreas
    Benes, Vladimir
    BMC GENOMICS, 2020, 21 (01)
  • [46] Global alignment of multiple protein interaction networks with application to functional orthology detection
    Singh, Rohit
    Xu, Jinbo
    Berger, Bonnie
    PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2008, 105 (35) : 12763 - 12768
  • [47] A technique for intrusion detection using multiple sequence alignment of system tall sequences
    Son, K
    Wee, K
    7TH WORLD MULTICONFERENCE ON SYSTEMICS, CYBERNETICS AND INFORMATICS, VOL IX, PROCEEDINGS: COMPUTER SCIENCE AND ENGINEERING: II, 2003, : 168 - 172
  • [48] Protein homology detection and fold inference through multiple alignment entropy profiles
    Sanchez-Flores, Alejandro
    Perez-Rueda, Ernesto
    Segovia, Lorenzo
    PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2008, 70 (01) : 248 - 256
  • [49] Improving Transmission Efficiency of Large Sequence Alignment/Map (SAM) Files
    Sakib, Muhammad Nazmus
    Tang, Jijun
    Zheng, W. Jim
    Huang, Chin-Tser
    PLOS ONE, 2011, 6 (12):
  • [50] A local multiple alignment method for detection of non-coding RNA sequences
    Tabei, Yasuo
    Asai, Kiyoshi
    BIOINFORMATICS, 2009, 25 (12) : 1498 - 1505