MocGCL: Molecular Graph Contrastive Learning via Negative Selection

被引：1

作者：

Cui, Jinhao ^{[1
]}

Chai, Heyan ^{[1
]}

Gong, Yanbin ^{[2
]}

Ding, Ye ^{[3
]}

Hua, Zhongyun ^{[1
]}

Gao, Cuiyun ^{[1
]}

Liao, Qing ^{[1
,4
]}

机构：

[1] Harbin Inst Technol Shenzhen, Shenzhen, Peoples R China

[2] Hong Kong Univ Sci & Technol, Hong Kong, Peoples R China

[3] Dongguan Univ Technol, Dongguan, Peoples R China

[4] Peng Cheng Lab, Shenzhen, Peoples R China

来源：

2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN | 2023年

基金：

中国国家自然科学基金;

关键词：

Graph contrastive learning; molecular classification; self-supervised learning; CLASSIFICATION; PREDICTION;

D O I：

10.1109/IJCNN54540.2023.10191518

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Molecular classification benefits a lot from the recent success of graph contrastive learning (GCL) which pulls positive samples close and pushes the negative samples apart. GCL methods generate negative and positive samples via graph augmentation. Due to the structural corruption caused by graph augmentation, not all generated negative samples retain discriminative semantics. However, existing GCL methods ignore the difference between negative samples and hold an assumption that the importance of all negative samples is the same, leading to degraded performance of molecular classification. To address this issue, in this paper, we propose a novel molecular graph contrastive learning model (MocGCL) by selecting more useful negative samples to improve the performance of molecular classification. Specifically, we first employ different encoders to generate positive samples to improve the diversity of positive samples. Then, we design negative generation to generate negative samples and define semantic integrity to measure the usefulness of generated negative samples. Moreover, we propose the novel negative selection to dynamically select the negative samples of more usefulness to improve the molecular representation. In addition, we improve the contrastive loss to adaptively adjust the distance between selected negative samples, which can preserve the distinctive properties of selected negative samples in sample space. Extensive experiments on six typical bioinformatics datasets demonstrate the effectiveness of our MocGCL compared to most state-of-the-art methods.

引用

页数：8

共 38 条

[1]

Alves Sarah Hannah, 2021, 2021 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), P2171, DOI 10.1109/BIBM52615.2021.9669609

[2]

[Anonymous], 2005, DERIVATION VALIDATIO, DOI DOI 10.1021/JM040835A

[3]

[Anonymous], 2020, Data mining and machine learning: Fundamental concepts and algorithms

[4]

[Anonymous], 2010, ACS PUBLICATIONS, DOI DOI 10.1021/CI100050T

[5]

[Anonymous], 2003, DISTINGUISHING ENZYM, DOI DOI 10.1016/S0022-2836(03)00628-4

[6]

[Anonymous], 1991, STRUCTURE ACTIVITY R

[7]

[Anonymous], 2008, COMP DESCRIPTOR SPAC, DOI DOI 10.1007/S10115-007-0103-5

[8]

Belghazi MI, 2018, PR MACH LEARN RES, V80

[9] Protein function prediction via graph kernels [J].

Borgwardt, KM ;

Ong, CS ;

Schönauer, S ;

Vishwanathan, SVN ;

Smola, AJ ;

Kriegel, HP .

BIOINFORMATICS, 2005, 21 :I47-I56

[10] Analysis and prediction of ship energy efficiency using 6G big data internet of things and artificial intelligence technology [J].

Deng, Jianhua ;

Zeng, Ji ;

Mai, Songyan ;

Jin, Bowen ;

Yuan, Bo ;

You, Yunhui ;

Lu, Shifeng ;

Yang, Mengkai .

INTERNATIONAL JOURNAL OF SYSTEM ASSURANCE ENGINEERING AND MANAGEMENT, 2021, 12 (04) :824-834

← 1 2 3 4 →