Understanding movie poster: transfer-deep learning approach for graphic-rich text recognition

被引:12
作者
Ghosh, Mridul [1 ]
Roy, Sayan Saha [2 ]
Mukherjee, Himadri [3 ]
Obaidullah, Sk Md [4 ]
Santosh, K. C. [5 ]
Roy, Kaushik [3 ]
机构
[1] Shyampur Siddheswari Mahavidyalaya, Dept Comp Sci, Howrah, India
[2] Calcutta Univ, Dept Radio Phys & Elect, Kolkata, India
[3] West Bengal State Univ, Dept Comp Sci, Kolkata, India
[4] Aliah Univ, Dept Comp Sci & Engn, Kolkata, India
[5] Univ South Dakota, Dept Comp Sci, Vermillion, SD 57069 USA
关键词
Transfer learning; CNN; Movie title; SCRIPT IDENTIFICATION; IMAGE;
D O I
10.1007/s00371-021-02094-6
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Graphic-rich texts are common in posters. In a movie poster, information, such as movie title, tag lines, and names of the actors, director, and production house, is available. Graphic-rich texts in movie titles represent not only sentiments but also their genre. Understanding the poster requires graphic-rich text recognition. Prior to that, one requires text localization, so background and foreground graffiti can be well segmented. In this paper, we propose a transfer learning-based approach for graphic-rich text localization, which was tuned by introducing reverse augmentation and rotated/inclined rectangle drawing technique. A convolution neural network-based model is then applied to identify their corresponding scripts. In our experiments, on a newly developed dataset (available upon request) that is composed of movie posters with multiple scripts of 1154 images, we achieved an average accuracy of 99.30%. Our results outperformed previously developed tools that are relying on handcrafted features.
引用
收藏
页码:1645 / 1664
页数:20
相关论文
共 44 条
  • [1] Agarwal Megha, 2010, International Journal of Signal and Imaging Systems Engineering, V3, P246, DOI 10.1504/IJSISE.2010.038020
  • [2] Learning Bayesian classifiers for scene classification with a visual grammar
    Aksoy, S
    Koperski, K
    Tusk, C
    Marchisio, G
    Tilton, JC
    [J]. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2005, 43 (03): : 581 - 589
  • [3] [Anonymous], 1999, Wiley Encyclopedia of Electrical and Electronics Engineering
  • [4] Banashree NP, 2007, PROC WRLD ACAD SCI E, V20, P46
  • [5] Script identification in natural scene image and video frames using an attention based Convolutional-LSTM network
    Bhunia, Ankan Kumar
    Konwer, Aishik
    Bhunia, Ayan Kumar
    Bhowmick, Abir
    Roy, Partha P.
    Pal, Umapada
    [J]. PATTERN RECOGNITION, 2019, 85 : 172 - 184
  • [6] Agreeing to disagree: active learning with noisy labels without crowdsourcing
    Bouguelia, Mohamed-Rafik
    Nowaczyk, Slawomir
    Santosh, K. C.
    Verikas, Antanas
    [J]. INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2018, 9 (08) : 1307 - 1319
  • [7] Texture for script identification
    Busch, A
    Boles, WW
    Sridharan, S
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2005, 27 (11) : 1720 - 1732
  • [8] A two-level clustering approach for multidimensional transfer function specification in volume visualization
    Cai, Lile
    Nguyen, Binh P.
    Chui, Chee-Kong
    Ong, Sim-Heng
    [J]. VISUAL COMPUTER, 2017, 33 (02) : 163 - 177
  • [9] A method for the estimation and recovering from general affine transforms in digital watermarking applications
    Deguillaume, F
    Voloshynovskiy, S
    Pun, T
    [J]. SECURITY AND WATERMARKING OF MULTIMEDIA CONTENTS IV, 2002, 4675 : 313 - 322
  • [10] Epshtein B, 2010, PROC CVPR IEEE, P2963, DOI 10.1109/CVPR.2010.5540041