Incremental & Semi-Supervised Learning for Functional Analysis of Protein Sequences

被引:1
作者
Halac, Mali [1 ]
Sokhansanj, Bahrad [1 ]
Trimble, William L. [2 ]
Coard, Thomas [1 ]
Sabin, Norman C., Jr. [3 ]
Ozdogan, Emrecan [3 ]
Polikar, Robi [3 ]
Rosen, Gail L. [1 ]
机构
[1] Drexel Univ, Elect & Comp Engn, Philadelphia, PA 19104 USA
[2] Univ Chicago, Argonne Natl Lab, Chicago, IL 60637 USA
[3] Rowan Univ, Elect & Comp Engn, Glassboro, NJ USA
来源
2021 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (IEEE SSCI 2021) | 2021年
关键词
Incremental clustering; semi-supervised learning; functional annotation; protein sequence; ALIGNMENT; TOOL;
D O I
10.1109/SSCI50451.2021.9659958
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Current approaches for the functional annotation of proteins rely on training a classifier based on a fixed reference database. As more genes are sequenced, the size of the reference database grows and classifiers are retrained with the old and some new data. Considering the ever-increasing number of (meta-)genomic data, repeating this process is computationally expensive. An alternative is to update the classifier continuously based on a stream of data. Thus, in this study, we propose an incremental and semi-supervised learning approach to train a classifier for the functional analysis of protein sequences. Our method proves to have a low computational cost while maintaining high accuracy in predicting protein functions.
引用
收藏
页数:8
相关论文
共 22 条
[1]  
Altschul SF., 2017, HDB DISCRETE COMBINA
[2]  
[Anonymous], MG RAST METAGENOMICS
[3]   Gene Ontology: tool for the unification of biology [J].
Ashburner, M ;
Ball, CA ;
Blake, JA ;
Botstein, D ;
Butler, H ;
Cherry, JM ;
Davis, AP ;
Dolinski, K ;
Dwight, SS ;
Eppig, JT ;
Harris, MA ;
Hill, DP ;
Issel-Tarver, L ;
Kasarskis, A ;
Lewis, S ;
Matese, JC ;
Richardson, JE ;
Ringwald, M ;
Rubin, GM ;
Sherlock, G .
NATURE GENETICS, 2000, 25 (01) :25-29
[4]   The SWISS-PROT protein sequence data bank and its supplement TrEMBL in 1999 [J].
Bairoch, A ;
Apweiler, R .
NUCLEIC ACIDS RESEARCH, 1999, 27 (01) :49-54
[5]  
Buchfink B., 1 TUTORIAL DIAMOND W
[6]   Sensitive protein alignments at tree-of-life scale using DIAMOND [J].
Buchfink, Benjamin ;
Reuter, Klaus ;
Drost, Hajk-Georg .
NATURE METHODS, 2021, 18 (04) :366-+
[7]   Fast and sensitive protein alignment using DIAMOND [J].
Buchfink, Benjamin ;
Xie, Chao ;
Huson, Daniel H. .
NATURE METHODS, 2015, 12 (01) :59-60
[8]   The Gene Ontology resource: enriching a GOld mine [J].
Carbon, Seth ;
Douglass, Eric ;
Good, Benjamin M. ;
Unni, Deepak R. ;
Harris, Nomi L. ;
Mungall, Christopher J. ;
Basu, Siddartha ;
Chisholm, Rex L. ;
Dodson, Robert J. ;
Hartline, Eric ;
Fey, Petra ;
Thomas, Paul D. ;
Albou, Laurent-Philippe ;
Ebert, Dustin ;
Kesling, Michael J. ;
Mi, Huaiyu ;
Muruganujan, Anushya ;
Huang, Xiaosong ;
Mushayahama, Tremayne ;
LaBonte, Sandra A. ;
Siegele, Deborah A. ;
Antonazzo, Giulia ;
Attrill, Helen ;
Brown, Nick H. ;
Garapati, Phani ;
Marygold, Steven J. ;
Trovisco, Vitor ;
Dos Santos, Gil ;
Falls, Kathleen ;
Tabone, Christopher ;
Zhou, Pinglei ;
Goodman, Joshua L. ;
Strelets, Victor B. ;
Thurmond, Jim ;
Garmiri, Penelope ;
Ishtiaq, Rizwan ;
Rodriguez-Lopez, Milagros ;
Acencio, Marcio L. ;
Kuiper, Martin ;
Laegreid, Astrid ;
Logie, Colin ;
Lovering, Ruth C. ;
Kramarz, Barbara ;
Saverimuttu, Shirin C. C. ;
Pinheiro, Sandra M. ;
Gunn, Heather ;
Su, Renzhi ;
Thurlow, Katherine E. ;
Chibucos, Marcus ;
Giglio, Michelle .
NUCLEIC ACIDS RESEARCH, 2021, 49 (D1) :D325-D334
[9]   Emerging Priorities for Microbiome Research [J].
Cullen, Chad M. ;
Aneja, Kawalpreet K. ;
Beyhan, Sinem ;
Cho, Clara E. ;
Woloszynek, Stephen ;
Convertino, Matteo ;
McCoy, Sophie J. ;
Zhang, Yanyan ;
Anderson, Matthew Z. ;
Alvarez-Ponce, David ;
Smirnova, Ekaterina ;
Karstens, Lisa ;
Dorrestein, Pieter C. ;
Li, Hongzhe ;
Sen Gupta, Ananya ;
Cheung, Kevin ;
Powers, Jennifer Gloeckner ;
Zhao, Zhengqiao ;
Rosen, Gail L. .
FRONTIERS IN MICROBIOLOGY, 2020, 11
[10]   FAMSA: Fast and accurate multiple sequence alignment of huge protein families [J].
Deorowicz, Sebastian ;
Debudaj-Grabysz, Agnieszka ;
Gudys, Adam .
SCIENTIFIC REPORTS, 2016, 6