MAGNeto: An Efficient Deep Learning Method for the Extractive Tags Summarization Problem

被引:0
作者
Hieu Trong Phung [1 ,2 ]
Anh Tuan Vu [1 ]
Tung Dinh Nguyen [1 ]
Lam Thanh Do [1 ,2 ]
Giang Nam Ngo [1 ]
Trung Thanh Tran [1 ]
Le, Ngoc C. [1 ,2 ]
机构
[1] PIXTA Vietnam, 8th Floor,Truong Thinh Bldg, Hanoi, Vietnam
[2] Hanoi Univ Sci & Technol, 1 Dai Co Viet Rd, Hanoi, Vietnam
来源
PROCEEDINGS OF SEVENTH INTERNATIONAL CONGRESS ON INFORMATION AND COMMUNICATION TECHNOLOGY, ICICT 2022, VOL 1 | 2023年 / 447卷
关键词
Deep learning; Language and vision; Self-supervision;
D O I
10.1007/978-981-19-1607-6_26
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this work, we study a new image annotation task named Extractive Tags Summarization (ETS). The goal is to extract important tags from the context lying in an image and its corresponding tags. We adjust some state-of-the-art deep learning models to utilize both visual and textual information. Our proposed solution consists of different widely used blocks like convolutional and self-attention layers, together with a novel idea of combining auxiliary loss functions and the gating mechanism to glue and elevate these fundamental components and form a unified architecture. Besides, we introduce a simple but effective data augmentation technique dedicated to alleviate the effect of outliers on the final results. Last but not least, we explore a self-supervised pre-training strategy to further boost the performance of the model by making use of the abundant amount of available unlabeled data. Our model shows the good results as 90% F-1 score on the public NUS-WIDE benchmark, and 50% F-1 score on a noisy large-scale real-world private dataset. Source code for reproducing the experiments is publicly available at: https://github.com/pixta-dev/labteam.
引用
收藏
页码:297 / 309
页数:13
相关论文
共 30 条
  • [1] Bahdanau D, 2016, Arxiv, DOI [arXiv:1409.0473, 10.48550/arXiv.1409.0473]
  • [2] Chua T.-S., 2009, P ACM INT C IM VID R, P1
  • [3] Chung JY, 2014, Arxiv, DOI [arXiv:1412.3555, 10.48550/arXiv.1412.3555]
  • [4] AutoAugment: Learning Augmentation Strategies from Data
    Cubuk, Ekin D.
    Zoph, Barret
    Mane, Dandelion
    Vasudevan, Vijay
    Le, Quoc V.
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 113 - 123
  • [5] Devlin J, 2019, Arxiv, DOI arXiv:1810.04805
  • [6] Gidaris S, 2018, Arxiv, DOI arXiv:1803.07728
  • [7] Graves A, 2012, STUD COMPUT INTELL, V385, P1, DOI [10.1162/neco.1997.9.1.1, 10.1007/978-3-642-24797-2]
  • [8] Simultaneous Detection and Segmentation
    Hariharan, Bharath
    Arbelaez, Pablo
    Girshick, Ross
    Malik, Jitendra
    [J]. COMPUTER VISION - ECCV 2014, PT VII, 2014, 8695 : 297 - 312
  • [9] Deep Residual Learning for Image Recognition
    He, Kaiming
    Zhang, Xiangyu
    Ren, Shaoqing
    Sun, Jian
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 770 - 778
  • [10] Ioffe S, 2015, Arxiv, DOI arXiv:1502.03167