This paper presents a deep learning model of building up hierarchical image representation. Each layer of hierarchy consists of three components: sparse coding, saliency pooling and local grouping. With sparse coding we identify distinctive coefficients for representing raw features of each lower layer; saliency pooling helps suppress noise and enhance translation invariance of sparse representation; we group locally pooled sparse codes to form more complex representations. Instead of using hand-crafted descriptors, our model learns an effective image representation directly from images in a unsuper-vised data-driven manner. We evaluate our algorithm with several benchmark databases of object recognition and analyze the contributions of different components. Experimental results show that our algorithm performs favorably against the state-of-the-art methods.
|Publication status||Published - 2011 Jan 1|
|Event||2011 22nd British Machine Vision Conference, BMVC 2011 - Dundee, United Kingdom|
Duration: 2011 Aug 29 → 2011 Sep 2
|Conference||2011 22nd British Machine Vision Conference, BMVC 2011|
|Period||11/8/29 → 11/9/2|
All Science Journal Classification (ASJC) codes
- Computer Vision and Pattern Recognition