TEX-Nets: Binary Patterns Encoded Convolutional Neural Networks for Texture Recognition

被引：6

作者：

Anwer, Rao Muhammad ^{[1
]}

Khan, Fahad Shahbaz ^{[2
]}

van de Weijer, Joost ^{[3
]}

Laaksonen, Jorma ^{[1
]}

机构：

[1] Aalto Univ, Sch Sci, Dept Comp Sci, Espoo, Finland

[2] Linkoping Univ, Comp Vis Lab, Linkoping, Sweden

[3] Univ Autonoma Barcelona, Comp Vis Ctr Barcelona, Barcelona, Spain

来源：

PROCEEDINGS OF THE 2017 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL (ICMR'17) | 2017年

基金：

芬兰科学院;

关键词：

Convolutional Neural Networks; Texture Recognition; Local Binary Patterns; CLASSIFICATION; COLOR; FEATURES;

D O I：

10.1145/3078971.3079001

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Recognizing materials and textures in realistic imaging conditions is a challenging computer vision problem. For many years, local features based orderless representations were a dominant approach for texture recognition. Recently deep local features, extracted from the intermediate layers of a Convolutional Neural Network (CNN), are used as filter banks. These dense local descriptors from a deep model, when encoded with Fisher Vectors, have shown to provide excellent results for texture recognition. The CNN models, employed in such approaches, take RGB patches as input and train on a large amount of labeled images. We show that CNN models, which we call TEX-Nets, trained using mapped coded images with explicit texture information provide complementary information to the standard deep models trained on RGB patches. We further investigate two deep architectures, namely early and late fusion, to combine the texture and color information. Experiments on benchmark texture datasets clearly demonstrate that TEX-Nets provide complementary information to standard RGB deep network. Our approach provides a large gain of 4.8%, 3.5%, 2.6% and 4.1% respectively in accuracy on the DTD, KTH-TIPS-2a, KTH-TIPS-2b and Texture-10 datasets, compared to the standard RGB network of the same architecture. Further, our final combination leads to consistent improvements over the state-of-the-art on all four datasets.

引用

页码：130 / 137

页数：8

共 54 条

[1]

[Anonymous], ARXIV161204884

[2]

[Anonymous], 2009, ICCV

[3]

[Anonymous], TIP

[4]

[Anonymous], 2009, SCIA

[5] Invariant Scattering Convolution Networks [J].

Bruna, Joan ;

Mallat, Stephane .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2013, 35 (08) :1872-1886

[6] Class-specific material categorisation [J].

Caputo, B ;

Hayman, E ;

Mallikarjuna, P .

TENTH IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION, VOLS 1 AND 2, PROCEEDINGS, 2005, :1597-1604

[7]

CHAN TH, 2015, TIP, V24, P5017, DOI DOI 10.1109/TIP.2015.2475625

[8] The devil is in the details: an evaluation of recent feature encoding methods [J].

Chatfield, Ken ;

Lempitsky, Victor ;

Vedaldi, Andrea ;

Zisserman, Andrew .

PROCEEDINGS OF THE BRITISH MACHINE VISION CONFERENCE 2011, 2011,

[9] WLD: A Robust Local Image Descriptor [J].

Chen, Jie ;

Shan, Shiguang ;

He, Chu ;

Zhao, Guoying ;

Pietikainen, Matti ;

Chen, Xilin ;

Gao, Wen .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2010, 32 (09) :1705-1720

[10] P-CNN: Pose-based CNN Features for Action Recognition [J].

Cheron, Guilhem ;

Laptev, Ivan ;

Schmid, Cordelia .

2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :3218-3226

← 1 2 3 4 5 6 →