Connecting the Dots in Self-Supervised Learning: A Brief Survey for Beginners

被引:4
作者
Fang, Peng-Fei [1 ,2 ]
Li, Xian [1 ]
Yan, Yang [1 ,3 ]
Zhang, Shuai [1 ,3 ]
Kang, Qi-Yue [1 ]
Li, Xiao-Fei [1 ]
Lan, Zhen-Zhong [1 ]
机构
[1] WestLake Univ, Sch Engn, Hangzhou 310030, Peoples R China
[2] Australian Natl Univ, Coll Engn & Comp Sci, Canberra, ACT 2601, Australia
[3] Zhejiang Univ, Coll Comp Sci & Technol, Hangzhou 310027, Peoples R China
关键词
artificial intelligence (AI); dot; self-supervised learning (SSL); survey; REPRESENTATION;
D O I
10.1007/s11390-022-2158-x
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The artificial intelligence (AI) community has recently made tremendous progress in developing self-supervised learning (SSL) algorithms that can learn high-quality data representations from massive amounts of unlabeled data. These methods brought great results even to the fields outside of AI. Due to the joint efforts of researchers in various areas, new SSL methods come out daily. However, such a sheer number of publications make it difficult for beginners to see clearly how the subject progresses. This survey bridges this gap by carefully selecting a small portion of papers that we believe are milestones or essential work. We see these researches as the "dots" of SSL and connect them through how they evolve. Hopefully, by viewing the connections of these dots, readers will have a high-level picture of the development of SSL across multiple disciplines including natural language processing, computer vision, graph learning, audio processing, and protein learning.
引用
收藏
页码:507 / 526
页数:20
相关论文
共 123 条
[1]  
Al-Tahan H, 2021, PR MACH LEARN RES, V130
[2]   Unified rational protein engineering with sequence-based deep representation learning [J].
Alley, Ethan C. ;
Khimulya, Grigory ;
Biswas, Surojit ;
AlQuraishi, Mohammed ;
Church, George M. .
NATURE METHODS, 2019, 16 (12) :1315-+
[3]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[4]  
[Anonymous], 2011, Mining of massive datasets
[5]   Continuous Distributed Representation of Biological Sequences for Deep Proteomics and Genomics [J].
Asgari, Ehsaneddin ;
Mofrad, Mohammad R. K. .
PLOS ONE, 2015, 10 (11)
[6]  
Bachman P, 2021, ARXIV190600910
[7]  
Baevski A, 2022, ARXIV191005453
[8]  
Baevski A, 2020, ADV NEUR IN, V33
[9]  
Bao H, 2021, ARXIV210608254, P2021
[10]  
Bay H., 2008, COMPUT VIS IMAGE UND, V110, P346, DOI [DOI 10.1016/j.cviu.2007.09.014, 10.1016/j.cviu.2007.09.014]