RLALIGN: A Reinforcement Learning Approach for Multiple Sequence Alignment

被引:9
作者
Ramakrishnan, Ramchalam Kinattinkara [1 ]
Singh, Jaspal [1 ]
Blanchette, Mathieu [1 ]
机构
[1] McGill Univ, Sch Comp Sci, Montreal, PQ, Canada
来源
PROCEEDINGS 2018 IEEE 18TH INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOENGINEERING (BIBE) | 2018年
关键词
bioinformatics; machine learning; reinforcement learning; multiple sequence alignment;
D O I
10.1109/BIBE.2018.00019
中图分类号
R318 [生物医学工程];
学科分类号
0831 ;
摘要
Multiple sequence alignment (MSA) is one of the best studied problems in bioinformatics because of the broad set of genomics, proteomics, and evolutionary analyses that rely on it. Yet the problem is NP-hard and existing heuristics are imperfect. Reinforcement learning (RL) techniques have emerged recently as a potential solution to a wide diversity of computational problems, but have yet to be applied to MSA. In this paper, we describe RLALIGN, a method to solve the MSA problem using RL. RLALIGN is based on Asynchronous Advantage Actor Critic (A3C), a cutting-edge RL framework. Due to the absence of a goal state, however, it required several important modifications. RLALIGN can be trained to accurately align moderate-length sequences, and various heuristics allow it to scale to longer sequences. The accuracy of the alignments produced is on par with, and often better than those of well established alignment algorithms. Overall, our work demonstrates the potential of RL approaches for complex combinatorial problems such as MSA. RLALIGN will prove useful for realignment tasks, where portions of a larger alignment need to be optimized. Unlike classical algorithms, RLALIGN is incognizant to the nature of the scoring scheme, leading to easy generalization to a variety of problem variants.
引用
收藏
页码:61 / 66
页数:6
相关论文
共 24 条
  • [1] Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
    Altschul, SF
    Madden, TL
    Schaffer, AA
    Zhang, JH
    Zhang, Z
    Miller, W
    Lipman, DJ
    [J]. NUCLEIC ACIDS RESEARCH, 1997, 25 (17) : 3389 - 3402
  • [2] BASIC LOCAL ALIGNMENT SEARCH TOOL
    ALTSCHUL, SF
    GISH, W
    MILLER, W
    MYERS, EW
    LIPMAN, DJ
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) : 403 - 410
  • [3] [Anonymous], 1992, MACHINE LEARNING
  • [4] [Anonymous], 2013, P NEURIPS DEEP LEARN
  • [5] [Anonymous], 1992, TECHNICAL NOTE Q LEA
  • [6] [Anonymous], 2017, IEEE SIGNAL PROCESSI
  • [7] Aligning multiple genomic sequences with the threaded blockset aligner
    Blanchette, M
    Kent, WJ
    Riemer, C
    Elnitski, L
    Smit, AFA
    Roskin, KM
    Baertsch, R
    Rosenbloom, K
    Clawson, H
    Green, ED
    Haussler, D
    Miller, W
    [J]. GENOME RESEARCH, 2004, 14 (04) : 708 - 715
  • [8] Computation and analysis of genomic multi-sequence alignments
    Blanchette, Mathieu
    [J]. ANNUAL REVIEW OF GENOMICS AND HUMAN GENETICS, 2007, 8 : 193 - 213
  • [9] State of the art: refinement of multiple sequence alignments
    Chakrabarti, Saikat
    Lanczycki, Christopher J.
    Panchenko, Anna R.
    Przytycka, Teresa M.
    Thiessen, Paul A.
    Bryant, Stephen H.
    [J]. BMC BIOINFORMATICS, 2006, 7 (1)
  • [10] Do C. B, 2005, GENOME RES, P330