MAGNeto: An Efficient Deep Learning Method for the Extractive Tags Summarization Problem

被引：0

作者：

Hieu Trong Phung ^{[1
,2
]}

Anh Tuan Vu ^{[1
]}

Tung Dinh Nguyen ^{[1
]}

Lam Thanh Do ^{[1
,2
]}

Giang Nam Ngo ^{[1
]}

Trung Thanh Tran ^{[1
]}

Le, Ngoc C. ^{[1
,2
]}

机构：

[1] PIXTA Vietnam, 8th Floor,Truong Thinh Bldg, Hanoi, Vietnam

[2] Hanoi Univ Sci & Technol, 1 Dai Co Viet Rd, Hanoi, Vietnam

来源：

PROCEEDINGS OF SEVENTH INTERNATIONAL CONGRESS ON INFORMATION AND COMMUNICATION TECHNOLOGY, ICICT 2022, VOL 1 | 2023年 / 447卷

关键词：

Deep learning; Language and vision; Self-supervision;

D O I：

10.1007/978-981-19-1607-6_26

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In this work, we study a new image annotation task named Extractive Tags Summarization (ETS). The goal is to extract important tags from the context lying in an image and its corresponding tags. We adjust some state-of-the-art deep learning models to utilize both visual and textual information. Our proposed solution consists of different widely used blocks like convolutional and self-attention layers, together with a novel idea of combining auxiliary loss functions and the gating mechanism to glue and elevate these fundamental components and form a unified architecture. Besides, we introduce a simple but effective data augmentation technique dedicated to alleviate the effect of outliers on the final results. Last but not least, we explore a self-supervised pre-training strategy to further boost the performance of the model by making use of the abundant amount of available unlabeled data. Our model shows the good results as 90% F-1 score on the public NUS-WIDE benchmark, and 50% F-1 score on a noisy large-scale real-world private dataset. Source code for reproducing the experiments is publicly available at: https://github.com/pixta-dev/labteam.

引用

页码：297 / 309

页数：13

共 30 条

[1] Bahdanau D, 2016, Arxiv, DOI [arXiv:1409.0473, 10.48550/arXiv.1409.0473]
[2] Chua T.-S., 2009, P ACM INT C IM VID R, P1
[3] Chung JY, 2014, Arxiv, DOI [arXiv:1412.3555, 10.48550/arXiv.1412.3555]
[4] AutoAugment: Learning Augmentation Strategies from Data
Cubuk, Ekin D.
Zoph, Barret
Mane, Dandelion
Vasudevan, Vijay
Le, Quoc V.
[J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 113 - 123
[5] Devlin J, 2019, Arxiv, DOI arXiv:1810.04805
[6] Gidaris S, 2018, Arxiv, DOI arXiv:1803.07728
[7] Graves A, 2012, STUD COMPUT INTELL, V385, P1, DOI [10.1162/neco.1997.9.1.1, 10.1007/978-3-642-24797-2]
[8] Simultaneous Detection and Segmentation
Hariharan, Bharath
Arbelaez, Pablo
Girshick, Ross
Malik, Jitendra
[J]. COMPUTER VISION - ECCV 2014, PT VII, 2014, 8695 : 297 - 312
[9] Deep Residual Learning for Image Recognition
He, Kaiming
Zhang, Xiangyu
Ren, Shaoqing
Sun, Jian
[J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 770 - 778
[10] Ioffe S, 2015, Arxiv, DOI arXiv:1502.03167

← 1 2 3 →