Using protein-domain information for multiple sequence alignment

被引：0

作者：

Al Ait, Layal ^{[1
]}

Corel, Eduardo ^{[1
]}

Morgenstern, Burkhard ^{[1
]}

机构：

[1] Univ Gottingen, Dept Bioinformat, Inst Microbiol & Genet, D-37077 Gottingen, Germany

来源：

IEEE 12TH INTERNATIONAL CONFERENCE ON BIOINFORMATICS & BIOENGINEERING | 2012年

关键词：

Multiple sequence alignment; protein domains; anchored alignment; GENE PREDICTION; ALGORITHM; DATABASE; DNA;

D O I：

暂无

中图分类号：

R318 [生物医学工程];

学科分类号：

0831 ;

摘要：

Most approaches to multiple sequence alignment rely on primary-sequence information. External sources of information, however, can give valuable hints to possible sequence homologies that may not be obvious from sequence comparison alone. Given the huge amount of sequence annotation that is being produced on a daily basis, integrating such external information into the alignment process can contribute to produce biologically more meaningful alignments. In this paper, we investigate different approaches to use existing information about protein domains for improved multiple alignments. We use the PFAM database to identify possible domains in protein sequences, and we use this information to align protein sequences with DIALIGN and with a recently developed graph-theoretical approach to multiple alignment. Test runs on BAliBASE and SABmark show that this approach leads to improved alignments.

引用

页码：163 / 168

页数：6

共 33 条

[1] A min-cut algorithm for the consistency problem in multiple sequence alignment
Corel, Eduardo
Pitschi, Florian
Morgenstern, Burkhard
[J]. BIOINFORMATICS, 2010, 26 (08) : 1015 - 1021
[2] Do C. B., 2006, P RECOMB 06
[3] ProbCons: Probabilistic consistency-based multiple sequence alignment
Do, CB
Mahabhashyam, MSP
Brudno, M
Batzoglou, S
[J]. GENOME RESEARCH, 2005, 15 (02) : 330 - 340
[4] Profile hidden Markov models
Eddy, SR
[J]. BIOINFORMATICS, 1998, 14 (09) : 755 - 763
[5] MUSCLE: multiple sequence alignment with high accuracy and high throughput
Edgar, RC
[J]. NUCLEIC ACIDS RESEARCH, 2004, 32 (05) : 1792 - 1797
[6] Multiple sequence alignment
Edgar, Robert C.
Batzoglou, Serafim
[J]. CURRENT OPINION IN STRUCTURAL BIOLOGY, 2006, 16 (03) : 368 - 373
[7] The Pfam protein families database
Finn, Robert D.
Tate, John
Mistry, Jaina
Coggill, Penny C.
Sammut, Stephen John
Hotz, Hans-Rudolf
Ceric, Goran
Forslund, Kristoffer
Eddy, Sean R.
Sonnhammer, Erik L. L.
Bateman, Alex
[J]. NUCLEIC ACIDS RESEARCH, 2008, 36 : D281 - D288
[8] HMMER web server: interactive sequence similarity searching
Finn, Robert D.
Clements, Jody
Eddy, Sean R.
[J]. NUCLEIC ACIDS RESEARCH, 2011, 39 : W29 - W37
[9] InterPro in 2011: new developments in the family and domain prediction database
Hunter, Sarah
Jones, Philip
Mitchell, Alex
Apweiler, Rolf
Attwood, Teresa K.
Bateman, Alex
Bernard, Thomas
Binns, David
Bork, Peer
Burge, Sarah
de Castro, Edouard
Coggill, Penny
Corbett, Matthew
Das, Ujjwal
Daugherty, Louise
Duquenne, Lauranne
Finn, Robert D.
Fraser, Matthew
Gough, Julian
Haft, Daniel
Hulo, Nicolas
Kahn, Daniel
Kelly, Elizabeth
Letunic, Ivica
Lonsdale, David
Lopez, Rodrigo
Madera, Martin
Maslen, John
McAnulla, Craig
McDowall, Jennifer
McMenamin, Conor
Mi, Huaiyu
Mutowo-Muellenet, Prudence
Mulder, Nicola
Natale, Darren
Orengo, Christine
Pesseat, Sebastien
Punta, Marco
Quinn, Antony F.
Rivoire, Catherine
Sangrador-Vegas, Amaia
Selengut, Jeremy D.
Sigrist, Christian J. A.
Scheremetjew, Maxim
Tate, John
Thimmajanarthanan, Manjulapramila
Thomas, Paul D.
Wu, Cathy H.
Yeats, Corin
Yong, Siew-Yit
[J]. NUCLEIC ACIDS RESEARCH, 2012, 40 (D1) : D306 - D312
[10] MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform
Katoh, K
Misawa, K
Kuma, K
Miyata, T
[J]. NUCLEIC ACIDS RESEARCH, 2002, 30 (14) : 3059 - 3066

← 1 2 3 4 →