DFT-based transformation invariant pooling layer for visual classification

Jongbin Ryu, Ming Hsuan Yang, Jongwoo Lim

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

We propose a novel discrete Fourier transform-based pooling layer for convolutional neural networks. The DFT magnitude pooling replaces the traditional max/average pooling layer between the convolution and fully-connected layers to retain translation invariance and shape preserving (aware of shape difference) properties based on the shift theorem of the Fourier transform. Thanks to the ability to handle image misalignment while keeping important structural information in the pooling stage, the DFT magnitude pooling improves the classification accuracy significantly. In addition, we propose the DFT+ method for ensemble networks using the middle convolution layer outputs. The proposed methods are extensively evaluated on various classification tasks using the ImageNet, CUB 2010-2011, MIT Indoors, Caltech 101, FMD and DTD datasets. The AlexNet, VGG-VD 16, Inception-v3, and ResNet are used as the base networks, upon which DFT and DFT+ methods are implemented. Experimental results show that the proposed methods improve the classification performance in all networks and datasets.

Original languageEnglish
Title of host publicationComputer Vision – ECCV 2018 - 15th European Conference, 2018, Proceedings
EditorsVittorio Ferrari, Cristian Sminchisescu, Yair Weiss, Martial Hebert
PublisherSpringer Verlag
Pages89-104
Number of pages16
ISBN (Print)9783030012632
DOIs
Publication statusPublished - 2018 Jan 1
Event15th European Conference on Computer Vision, ECCV 2018 - Munich, Germany
Duration: 2018 Sep 82018 Sep 14

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume11218 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other15th European Conference on Computer Vision, ECCV 2018
CountryGermany
CityMunich
Period18/9/818/9/14

Fingerprint

Pooling
Discrete Fourier transforms
Invariant
Convolution
Translation Invariance
Shape Preserving
Discrete Fourier transform
Misalignment
Invariance
Fourier transform
Ensemble
Vision
Neural Networks
Fourier transforms
Output
Neural networks
Experimental Results
Theorem

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • Computer Science(all)

Cite this

Ryu, J., Yang, M. H., & Lim, J. (2018). DFT-based transformation invariant pooling layer for visual classification. In V. Ferrari, C. Sminchisescu, Y. Weiss, & M. Hebert (Eds.), Computer Vision – ECCV 2018 - 15th European Conference, 2018, Proceedings (pp. 89-104). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 11218 LNCS). Springer Verlag. https://doi.org/10.1007/978-3-030-01264-9_6
Ryu, Jongbin ; Yang, Ming Hsuan ; Lim, Jongwoo. / DFT-based transformation invariant pooling layer for visual classification. Computer Vision – ECCV 2018 - 15th European Conference, 2018, Proceedings. editor / Vittorio Ferrari ; Cristian Sminchisescu ; Yair Weiss ; Martial Hebert. Springer Verlag, 2018. pp. 89-104 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{37b81fc775a84c49b97a98bee4225902,
title = "DFT-based transformation invariant pooling layer for visual classification",
abstract = "We propose a novel discrete Fourier transform-based pooling layer for convolutional neural networks. The DFT magnitude pooling replaces the traditional max/average pooling layer between the convolution and fully-connected layers to retain translation invariance and shape preserving (aware of shape difference) properties based on the shift theorem of the Fourier transform. Thanks to the ability to handle image misalignment while keeping important structural information in the pooling stage, the DFT magnitude pooling improves the classification accuracy significantly. In addition, we propose the DFT+ method for ensemble networks using the middle convolution layer outputs. The proposed methods are extensively evaluated on various classification tasks using the ImageNet, CUB 2010-2011, MIT Indoors, Caltech 101, FMD and DTD datasets. The AlexNet, VGG-VD 16, Inception-v3, and ResNet are used as the base networks, upon which DFT and DFT+ methods are implemented. Experimental results show that the proposed methods improve the classification performance in all networks and datasets.",
author = "Jongbin Ryu and Yang, {Ming Hsuan} and Jongwoo Lim",
year = "2018",
month = "1",
day = "1",
doi = "10.1007/978-3-030-01264-9_6",
language = "English",
isbn = "9783030012632",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
publisher = "Springer Verlag",
pages = "89--104",
editor = "Vittorio Ferrari and Cristian Sminchisescu and Yair Weiss and Martial Hebert",
booktitle = "Computer Vision – ECCV 2018 - 15th European Conference, 2018, Proceedings",
address = "Germany",

}

Ryu, J, Yang, MH & Lim, J 2018, DFT-based transformation invariant pooling layer for visual classification. in V Ferrari, C Sminchisescu, Y Weiss & M Hebert (eds), Computer Vision – ECCV 2018 - 15th European Conference, 2018, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 11218 LNCS, Springer Verlag, pp. 89-104, 15th European Conference on Computer Vision, ECCV 2018, Munich, Germany, 18/9/8. https://doi.org/10.1007/978-3-030-01264-9_6

DFT-based transformation invariant pooling layer for visual classification. / Ryu, Jongbin; Yang, Ming Hsuan; Lim, Jongwoo.

Computer Vision – ECCV 2018 - 15th European Conference, 2018, Proceedings. ed. / Vittorio Ferrari; Cristian Sminchisescu; Yair Weiss; Martial Hebert. Springer Verlag, 2018. p. 89-104 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 11218 LNCS).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - DFT-based transformation invariant pooling layer for visual classification

AU - Ryu, Jongbin

AU - Yang, Ming Hsuan

AU - Lim, Jongwoo

PY - 2018/1/1

Y1 - 2018/1/1

N2 - We propose a novel discrete Fourier transform-based pooling layer for convolutional neural networks. The DFT magnitude pooling replaces the traditional max/average pooling layer between the convolution and fully-connected layers to retain translation invariance and shape preserving (aware of shape difference) properties based on the shift theorem of the Fourier transform. Thanks to the ability to handle image misalignment while keeping important structural information in the pooling stage, the DFT magnitude pooling improves the classification accuracy significantly. In addition, we propose the DFT+ method for ensemble networks using the middle convolution layer outputs. The proposed methods are extensively evaluated on various classification tasks using the ImageNet, CUB 2010-2011, MIT Indoors, Caltech 101, FMD and DTD datasets. The AlexNet, VGG-VD 16, Inception-v3, and ResNet are used as the base networks, upon which DFT and DFT+ methods are implemented. Experimental results show that the proposed methods improve the classification performance in all networks and datasets.

AB - We propose a novel discrete Fourier transform-based pooling layer for convolutional neural networks. The DFT magnitude pooling replaces the traditional max/average pooling layer between the convolution and fully-connected layers to retain translation invariance and shape preserving (aware of shape difference) properties based on the shift theorem of the Fourier transform. Thanks to the ability to handle image misalignment while keeping important structural information in the pooling stage, the DFT magnitude pooling improves the classification accuracy significantly. In addition, we propose the DFT+ method for ensemble networks using the middle convolution layer outputs. The proposed methods are extensively evaluated on various classification tasks using the ImageNet, CUB 2010-2011, MIT Indoors, Caltech 101, FMD and DTD datasets. The AlexNet, VGG-VD 16, Inception-v3, and ResNet are used as the base networks, upon which DFT and DFT+ methods are implemented. Experimental results show that the proposed methods improve the classification performance in all networks and datasets.

UR - http://www.scopus.com/inward/record.url?scp=85055696119&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85055696119&partnerID=8YFLogxK

U2 - 10.1007/978-3-030-01264-9_6

DO - 10.1007/978-3-030-01264-9_6

M3 - Conference contribution

AN - SCOPUS:85055696119

SN - 9783030012632

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 89

EP - 104

BT - Computer Vision – ECCV 2018 - 15th European Conference, 2018, Proceedings

A2 - Ferrari, Vittorio

A2 - Sminchisescu, Cristian

A2 - Weiss, Yair

A2 - Hebert, Martial

PB - Springer Verlag

ER -

Ryu J, Yang MH, Lim J. DFT-based transformation invariant pooling layer for visual classification. In Ferrari V, Sminchisescu C, Weiss Y, Hebert M, editors, Computer Vision – ECCV 2018 - 15th European Conference, 2018, Proceedings. Springer Verlag. 2018. p. 89-104. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/978-3-030-01264-9_6