Understanding movie poster: transfer-deep learning approach for graphic-rich text recognition

被引：12

作者：

Ghosh, Mridul ^{[1
]}

Roy, Sayan Saha ^{[2
]}

Mukherjee, Himadri ^{[3
]}

Obaidullah, Sk Md ^{[4
]}

Santosh, K. C. ^{[5
]}

Roy, Kaushik ^{[3
]}

机构：

[1] Shyampur Siddheswari Mahavidyalaya, Dept Comp Sci, Howrah, India

[2] Calcutta Univ, Dept Radio Phys & Elect, Kolkata, India

[3] West Bengal State Univ, Dept Comp Sci, Kolkata, India

[4] Aliah Univ, Dept Comp Sci & Engn, Kolkata, India

[5] Univ South Dakota, Dept Comp Sci, Vermillion, SD 57069 USA

来源：

VISUAL COMPUTER | 2022年 / 38卷 / 05期

关键词：

Transfer learning; CNN; Movie title; SCRIPT IDENTIFICATION; IMAGE;

D O I：

10.1007/s00371-021-02094-6

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Graphic-rich texts are common in posters. In a movie poster, information, such as movie title, tag lines, and names of the actors, director, and production house, is available. Graphic-rich texts in movie titles represent not only sentiments but also their genre. Understanding the poster requires graphic-rich text recognition. Prior to that, one requires text localization, so background and foreground graffiti can be well segmented. In this paper, we propose a transfer learning-based approach for graphic-rich text localization, which was tuned by introducing reverse augmentation and rotated/inclined rectangle drawing technique. A convolution neural network-based model is then applied to identify their corresponding scripts. In our experiments, on a newly developed dataset (available upon request) that is composed of movie posters with multiple scripts of 1154 images, we achieved an average accuracy of 99.30%. Our results outperformed previously developed tools that are relying on handcrafted features.

引用

页码：1645 / 1664

页数：20

共 44 条

[1] Agarwal Megha, 2010, International Journal of Signal and Imaging Systems Engineering, V3, P246, DOI 10.1504/IJSISE.2010.038020
[2] Learning Bayesian classifiers for scene classification with a visual grammar
Aksoy, S
Koperski, K
Tusk, C
Marchisio, G
Tilton, JC
[J]. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2005, 43 (03): : 581 - 589
[3] [Anonymous], 1999, Wiley Encyclopedia of Electrical and Electronics Engineering
[4] Banashree NP, 2007, PROC WRLD ACAD SCI E, V20, P46
[5] Script identification in natural scene image and video frames using an attention based Convolutional-LSTM network
Bhunia, Ankan Kumar
Konwer, Aishik
Bhunia, Ayan Kumar
Bhowmick, Abir
Roy, Partha P.
Pal, Umapada
[J]. PATTERN RECOGNITION, 2019, 85 : 172 - 184
[6] Agreeing to disagree: active learning with noisy labels without crowdsourcing
Bouguelia, Mohamed-Rafik
Nowaczyk, Slawomir
Santosh, K. C.
Verikas, Antanas
[J]. INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2018, 9 (08) : 1307 - 1319
[7] Texture for script identification
Busch, A
Boles, WW
Sridharan, S
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2005, 27 (11) : 1720 - 1732
[8] A two-level clustering approach for multidimensional transfer function specification in volume visualization
Cai, Lile
Nguyen, Binh P.
Chui, Chee-Kong
Ong, Sim-Heng
[J]. VISUAL COMPUTER, 2017, 33 (02) : 163 - 177
[9] A method for the estimation and recovering from general affine transforms in digital watermarking applications
Deguillaume, F
Voloshynovskiy, S
Pun, T
[J]. SECURITY AND WATERMARKING OF MULTIMEDIA CONTENTS IV, 2002, 4675 : 313 - 322
[10] Epshtein B, 2010, PROC CVPR IEEE, P2963, DOI 10.1109/CVPR.2010.5540041

← 1 2 3 4 5 →