Simplicity of Kmeans versus Deepness of Deep Learning: A Case of Unsupervised Feature Learning with Limited Data

被引：24

作者：

Dundar, Murat ^{[1
]}

Kou, Qiang ^{[2
]}

Zhang, Baichuan ^{[1
]}

He, Yicheng ^{[1
]}

Rajwa, Bartek ^{[3
]}

机构：

[1] Indiana Univ Purdue Univ, Dept Comp & Informat Sci, Indianapolis, IN 46202 USA

[2] Indiana Univ Purdue Univ, Dept Biohlth Informat, Indianapolis, IN 46202 USA

[3] Purdue Univ, Bindley Biosci Ctr, W Lafayette, IN 47907 USA

来源：

2015 IEEE 14TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA) | 2015年

基金：

美国国家科学基金会;

关键词：

D O I：

10.1109/ICMLA.2015.78

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

We study a bio-detection application as a case study to demonstrate that Kmeans-based unsupervised feature learning can be a simple yet effective alternative to deep learning techniques for small data sets with limited intra-as well as inter-class diversity. We investigate the effect on the classifier performance of data augmentation as well as feature extraction with multiple patch sizes and at different image scales. Our data set includes 1833 images from four different classes of bacteria, each bacterial culture captured at three different wavelengths and overall data collected during a three-day period. The limited number and diversity of images present, potential random effects across multiple days, and the multi-mode nature of class distributions pose a challenging setting for representation learning. Using images collected on the first day for training, on the second day for validation, and on the third day for testing Kmeans-based representation learning achieves 97% classification accuracy on the test data. This compares very favorably to 56% accuracy achieved by deep learning and 74% accuracy achieved by handcrafted features. Our results suggest that data augmentation or dropping connections between units offers little help for deep-learning algorithms, whereas significant boost can be achieved by Kmeans-based representation learning by augmenting data and by concatenating features obtained at multiple patch sizes or image scales.

引用

页码：883 / 888

页数：6

共 14 条

[1]

[Anonymous], 2006, ADV NEURAL INF PROCE

[2]

[Anonymous], 2010, P PYTHON SCI COMPUTI

[3]

[Anonymous], 2008, P ICML, DOI 10.1145/1390156.1390294

[4] Feature extraction from light-scatter patterns of Listeria colonies for identification and classification [J].

Bayraktar, Bulent ;

Banada, Padmapriya P. ;

Hirleman, E. Daniel ;

Bhunia, Arun K. ;

Robinson, J. Paul ;

Rajwa, Bartek .

JOURNAL OF BIOMEDICAL OPTICS, 2006, 11 (03)

[5]

Blum Manuel, 2011, NIPS WORKSH DEEP LEA

[6]

Coates Adam, 2012, Neural Networks: Tricks of the Trade. Second Edition: LNCS 7700, P561, DOI 10.1007/978-3-642-35289-8_30

[7]

Coates A., 2011, P 14 AISTATS, P215

[8]

Fan RE, 2008, J MACH LEARN RES, V9, P1871

[9] TEXTURAL FEATURES FOR IMAGE CLASSIFICATION [J].

HARALICK, RM ;

SHANMUGAM, K ;

DINSTEIN, I .

IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS, 1973, SMC3 (06) :610-621

[10] INVARIANT IMAGE RECOGNITION BY ZERNIKE MOMENTS [J].

KHOTANZAD, A ;

HONG, YH .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1990, 12 (05) :489-497

← 1 2 →