Using protein-domain information for multiple sequence alignment

被引:0
作者
Al Ait, Layal [1 ]
Corel, Eduardo [1 ]
Morgenstern, Burkhard [1 ]
机构
[1] Univ Gottingen, Dept Bioinformat, Inst Microbiol & Genet, D-37077 Gottingen, Germany
来源
IEEE 12TH INTERNATIONAL CONFERENCE ON BIOINFORMATICS & BIOENGINEERING | 2012年
关键词
Multiple sequence alignment; protein domains; anchored alignment; GENE PREDICTION; ALGORITHM; DATABASE; DNA;
D O I
暂无
中图分类号
R318 [生物医学工程];
学科分类号
0831 ;
摘要
Most approaches to multiple sequence alignment rely on primary-sequence information. External sources of information, however, can give valuable hints to possible sequence homologies that may not be obvious from sequence comparison alone. Given the huge amount of sequence annotation that is being produced on a daily basis, integrating such external information into the alignment process can contribute to produce biologically more meaningful alignments. In this paper, we investigate different approaches to use existing information about protein domains for improved multiple alignments. We use the PFAM database to identify possible domains in protein sequences, and we use this information to align protein sequences with DIALIGN and with a recently developed graph-theoretical approach to multiple alignment. Test runs on BAliBASE and SABmark show that this approach leads to improved alignments.
引用
收藏
页码:163 / 168
页数:6
相关论文
共 33 条
  • [1] A min-cut algorithm for the consistency problem in multiple sequence alignment
    Corel, Eduardo
    Pitschi, Florian
    Morgenstern, Burkhard
    [J]. BIOINFORMATICS, 2010, 26 (08) : 1015 - 1021
  • [2] Do C. B., 2006, P RECOMB 06
  • [3] ProbCons: Probabilistic consistency-based multiple sequence alignment
    Do, CB
    Mahabhashyam, MSP
    Brudno, M
    Batzoglou, S
    [J]. GENOME RESEARCH, 2005, 15 (02) : 330 - 340
  • [4] Profile hidden Markov models
    Eddy, SR
    [J]. BIOINFORMATICS, 1998, 14 (09) : 755 - 763
  • [5] MUSCLE: multiple sequence alignment with high accuracy and high throughput
    Edgar, RC
    [J]. NUCLEIC ACIDS RESEARCH, 2004, 32 (05) : 1792 - 1797
  • [6] Multiple sequence alignment
    Edgar, Robert C.
    Batzoglou, Serafim
    [J]. CURRENT OPINION IN STRUCTURAL BIOLOGY, 2006, 16 (03) : 368 - 373
  • [7] The Pfam protein families database
    Finn, Robert D.
    Tate, John
    Mistry, Jaina
    Coggill, Penny C.
    Sammut, Stephen John
    Hotz, Hans-Rudolf
    Ceric, Goran
    Forslund, Kristoffer
    Eddy, Sean R.
    Sonnhammer, Erik L. L.
    Bateman, Alex
    [J]. NUCLEIC ACIDS RESEARCH, 2008, 36 : D281 - D288
  • [8] HMMER web server: interactive sequence similarity searching
    Finn, Robert D.
    Clements, Jody
    Eddy, Sean R.
    [J]. NUCLEIC ACIDS RESEARCH, 2011, 39 : W29 - W37
  • [9] InterPro in 2011: new developments in the family and domain prediction database
    Hunter, Sarah
    Jones, Philip
    Mitchell, Alex
    Apweiler, Rolf
    Attwood, Teresa K.
    Bateman, Alex
    Bernard, Thomas
    Binns, David
    Bork, Peer
    Burge, Sarah
    de Castro, Edouard
    Coggill, Penny
    Corbett, Matthew
    Das, Ujjwal
    Daugherty, Louise
    Duquenne, Lauranne
    Finn, Robert D.
    Fraser, Matthew
    Gough, Julian
    Haft, Daniel
    Hulo, Nicolas
    Kahn, Daniel
    Kelly, Elizabeth
    Letunic, Ivica
    Lonsdale, David
    Lopez, Rodrigo
    Madera, Martin
    Maslen, John
    McAnulla, Craig
    McDowall, Jennifer
    McMenamin, Conor
    Mi, Huaiyu
    Mutowo-Muellenet, Prudence
    Mulder, Nicola
    Natale, Darren
    Orengo, Christine
    Pesseat, Sebastien
    Punta, Marco
    Quinn, Antony F.
    Rivoire, Catherine
    Sangrador-Vegas, Amaia
    Selengut, Jeremy D.
    Sigrist, Christian J. A.
    Scheremetjew, Maxim
    Tate, John
    Thimmajanarthanan, Manjulapramila
    Thomas, Paul D.
    Wu, Cathy H.
    Yeats, Corin
    Yong, Siew-Yit
    [J]. NUCLEIC ACIDS RESEARCH, 2012, 40 (D1) : D306 - D312
  • [10] MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform
    Katoh, K
    Misawa, K
    Kuma, K
    Miyata, T
    [J]. NUCLEIC ACIDS RESEARCH, 2002, 30 (14) : 3059 - 3066