RLALIGN: A Reinforcement Learning Approach for Multiple Sequence Alignment

被引:9
作者
Ramakrishnan, Ramchalam Kinattinkara [1 ]
Singh, Jaspal [1 ]
Blanchette, Mathieu [1 ]
机构
[1] McGill Univ, Sch Comp Sci, Montreal, PQ, Canada
来源
PROCEEDINGS 2018 IEEE 18TH INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOENGINEERING (BIBE) | 2018年
关键词
bioinformatics; machine learning; reinforcement learning; multiple sequence alignment;
D O I
10.1109/BIBE.2018.00019
中图分类号
R318 [生物医学工程];
学科分类号
0831 ;
摘要
Multiple sequence alignment (MSA) is one of the best studied problems in bioinformatics because of the broad set of genomics, proteomics, and evolutionary analyses that rely on it. Yet the problem is NP-hard and existing heuristics are imperfect. Reinforcement learning (RL) techniques have emerged recently as a potential solution to a wide diversity of computational problems, but have yet to be applied to MSA. In this paper, we describe RLALIGN, a method to solve the MSA problem using RL. RLALIGN is based on Asynchronous Advantage Actor Critic (A3C), a cutting-edge RL framework. Due to the absence of a goal state, however, it required several important modifications. RLALIGN can be trained to accurately align moderate-length sequences, and various heuristics allow it to scale to longer sequences. The accuracy of the alignments produced is on par with, and often better than those of well established alignment algorithms. Overall, our work demonstrates the potential of RL approaches for complex combinatorial problems such as MSA. RLALIGN will prove useful for realignment tasks, where portions of a larger alignment need to be optimized. Unlike classical algorithms, RLALIGN is incognizant to the nature of the scoring scheme, leading to easy generalization to a variety of problem variants.
引用
收藏
页码:61 / 66
页数:6
相关论文
共 24 条
  • [11] Alignathon: a competitive assessment of whole-genome alignment methods
    Earl, Dent
    Nguyen, Ngan
    Hickey, Glenn
    Harris, Robert S.
    Fitzgerald, Stephen
    Beal, Kathryn
    Seledtsov, Igor
    Molodtsov, Vladimir
    Raney, Brian J.
    Clawson, Hiram
    Kim, Jaebum
    Kemena, Carsten
    Chang, Jia-Ming
    Erb, Ionas
    Poliakov, Alexander
    Hou, Minmei
    Herrero, Javier
    Kent, William James
    Solovyev, Victor
    Darling, Aaron E.
    Ma, Jian
    Notredame, Cedric
    Brudno, Michael
    Dubchak, Inna
    Haussler, David
    Paten, Benedict
    [J]. GENOME RESEARCH, 2014, 24 (12) : 2077 - 2089
  • [12] MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform
    Katoh, K
    Misawa, K
    Kuma, K
    Miyata, T
    [J]. NUCLEIC ACIDS RESEARCH, 2002, 30 (14) : 3059 - 3066
  • [13] Phylo: A Citizen Science Approach for Improving Multiple Sequence Alignment
    Kawrykow, Alexander
    Roumanis, Gary
    Kam, Alfred
    Kwak, Daniel
    Leung, Clarence
    Wu, Chu
    Zarour, Eleyine
    Players, Phylo
    Sarmenta, Luis
    Blanchette, Mathieu
    Waldispuehl, Jerome
    [J]. PLOS ONE, 2012, 7 (03):
  • [14] Mao H, 2017, ACM 2017
  • [15] Mircea I. G, 2016, ADV INTELLIGENT SYST
  • [16] Mnih V, 2016, ICML 2016
  • [17] Multiple sequence alignment with user-defined constraints at GOBICS
    Morgenstern, B
    Werner, N
    Prohaska, SJ
    Steinkamp, R
    Schneider, I
    Subramanian, AR
    Stadler, PF
    Weyer-Menkhoff, J
    [J]. BIOINFORMATICS, 2005, 21 (07) : 1271 - 1273
  • [18] Enredo and Pecan: Genome-wide mammalian consistency-based multiple alignment with paralogs
    Paten, Benedict
    Herrero, Javier
    Beal, Kathryn
    Fitzgerald, Stephen
    Birney, Ewan
    [J]. GENOME RESEARCH, 2008, 18 (11) : 1814 - 1828
  • [19] Plessen M. G., 2017, ARXIV171110785
  • [20] Punch W F, 1999, GECCO 99 P 1 ANN C G, V529