Acoustic-based LEGO recognition using attention-based convolutional neural networks

被引:0
|
作者
Van-Thuan Tran
Chia-Yang Wu
Wei-Ho Tsai
机构
[1] National Taipei University of Technology,Department of Electronic Engineering
来源
Artificial Intelligence Review | 2024年 / 57卷
关键词
LEGO recognition; Acoustic-based object detection; Attention mechanism; Audio classification; Audio features; Convolutional neural networks; Time-distributed layers;
D O I
暂无
中图分类号
学科分类号
摘要
This work investigates the classification of LEGO types using deep learning-based audio classification approaches. The motivation for this investigation is based on the following assumption. If objects of the same shape fall freely from a certain height and hit a fixed plane, the impact sounds will be very similar, so we can distinguish the same types of objects from the others. Applying this idea to LEGO recognition, we collect impact sounds of 200 LEGO objects that fall from a height of about 30cm from a designated plane, and design a CNN-based recognition system that processes the impact sounds to determine the type of LEGO it belongs to. Recognizing that the fall of LEGO results in the main impact sound (i.e., only the sound at the moment of impact) and several subsequent sounds, we examine whether considering only the first impact sound or all sounds brings about better classification accuracies. We propose a compact two-dimensional CNN model, namely LegoNet, which is designed with a frame-level attention module at the input spectrogram and time-distributed fully-connected layers. Our experiments show that free-fall impact sounds can be used efficiently for accurate object recognition, and the proposed LegoNet, with a much smaller size, achieves better accuracy and robustness compared to baseline models. Also, using the whole sequence of impact sounds is more informative for LEGO classification than only considering the first impact sound. Moreover, it is found that utilizing data of specific object postures can help to improve the classifier’s performance in the case of small training data. The proposed approach can be employed as an extra module to build intelligent agents or object classification systems that require a rich understanding of the surrounding physical world.
引用
收藏
相关论文
共 50 条
  • [31] Attention-Based Bidirectional Recurrent Neural Networks for Description Generation of Videos
    Du, Xiaotong
    Yuan, Jiabin
    Liu, Hu
    CLOUD COMPUTING AND SECURITY, PT VI, 2018, 11068 : 440 - 451
  • [32] GAIT RECOGNITION BASED ON CONVOLUTIONAL NEURAL NETWORKS
    Sokolova, A.
    Konushin, A.
    INTERNATIONAL WORKSHOP PHOTOGRAMMETRIC AND COMPUTER VISION TECHNIQUES FOR VIDEO SURVEILLANCE, BIOMETRICS AND BIOMEDICINE, 2017, 42-2 (W4): : 207 - 212
  • [33] Attention-based 3D convolutional recurrent neural network model for multimodal emotion recognition
    Du, Yiming
    Li, Penghai
    Cheng, Longlong
    Zhang, Xuanwei
    Li, Mingji
    Li, Fengzhou
    FRONTIERS IN NEUROSCIENCE, 2024, 17
  • [34] An Attention-Based Convolutional Neural Network for Acute Lymphoblastic Leukemia Classification
    Ullah, Muhammad Zakir
    Zheng, Yuanjie
    Song, Jingqi
    Aslam, Sehrish
    Xu, Chenxi
    Kiazolu, Gogo Dauda
    Wang, Liping
    APPLIED SCIENCES-BASEL, 2021, 11 (22):
  • [35] Attention-Based Fully Convolutional DenseNet for Earthquake Detection
    Elsayed, Hagar S.
    Saad, Omar M.
    Soliman, M. Sami
    Chen, Yangkang
    Youness, Hassan A.
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
  • [36] Attention-Based Residual BiLSTM Networks for Human Activity Recognition
    Zhang, Junjie
    Liu, Yuanhao
    Yuan, Hua
    IEEE ACCESS, 2023, 11 : 94173 - 94187
  • [37] Identity Recognition based on Convolutional Neural Networks Using Gait Data
    Faraji, F.
    Lotfi, F.
    Majdolhosseini, M.
    Jafarian, M.
    Taghirad, H. D.
    2021 26TH INTERNATIONAL COMPUTER CONFERENCE, COMPUTER SOCIETY OF IRAN (CSICC), 2021,
  • [38] Dynamic Korean Sign Language Recognition Using Pose Estimation Based and Attention-Based Neural Network
    Shin, Jungpil
    Miah, Abu Saleh Musa
    Suzuki, Kota
    Hirooka, Koki
    Hasan, Md. Al Mehedi
    IEEE ACCESS, 2023, 11 : 143501 - 143513
  • [39] Speech Emotion Recognition Using Convolutional Neural Networks with Attention Mechanism
    Mountzouris, Konstantinos
    Perikos, Isidoros
    Hatzilygeroudis, Ioannis
    Corchado, Juan M.
    Iglesias, Carlos A.
    Kim, Byung-Gyu
    Mehmood, Rashid
    Ren, Fuji
    Lee, In
    ELECTRONICS, 2023, 12 (20)
  • [40] Quality Prediction Modeling for Industrial Processes Using Multiscale Attention-Based Convolutional Neural Network
    Yuan, Xiaofeng
    Huang, Lingfeng
    Ye, Lingjian
    Wang, Yalin
    Wang, Kai
    Yang, Chunhua
    Gui, Weihua
    Shen, Feifan
    IEEE TRANSACTIONS ON CYBERNETICS, 2024, 54 (05) : 2696 - 2707