OCEAN: Object-centric arranging network for self-supervised visual representations learning

Changjae Oh, Bumsub Ham, Hansung Kim, Adrian Hilton, Kwanghoon Sohn

Research output: Contribution to journalArticle

1 Citation (Scopus)

Abstract

Learning visual representations plays an important role in computer vision and machine learning applications. It facilitates a model to understand and perform high-level tasks intelligently. A common approach for learning visual representations is supervised one which requires a huge amount of human annotations to train the model. This paper presents a self-supervised approach which learns visual representations from input images without human annotations. We learn the correct arrangement of object proposals to represent an image using a convolutional neural network (CNN) without any manual annotations. We hypothesize that the network trained for solving this problem requires the embedding of semantic visual representations. Unlike existing approaches that use uniformly sampled patches, we relate object proposals that contain prominent objects and object parts. More specifically, we discover the representation that considers overlap, inclusion, and exclusion relationship of proposals as well as their relative position. This allows focusing on potential objects and parts rather than on clutter. We demonstrate that our model outperforms existing self-supervised learning methods and can be used as a generic feature extractor by applying it to object detection, classification, action recognition, image retrieval, and semantic matching tasks.

Original languageEnglish
Pages (from-to)281-292
Number of pages12
JournalExpert Systems with Applications
Volume125
DOIs
Publication statusPublished - 2019 Jul 1

Fingerprint

Semantics
Supervised learning
Image retrieval
Computer vision
Learning systems
Neural networks
Object detection

All Science Journal Classification (ASJC) codes

  • Engineering(all)
  • Computer Science Applications
  • Artificial Intelligence

Cite this

@article{237022fbc9014620b06726283726eb3f,
title = "OCEAN: Object-centric arranging network for self-supervised visual representations learning",
abstract = "Learning visual representations plays an important role in computer vision and machine learning applications. It facilitates a model to understand and perform high-level tasks intelligently. A common approach for learning visual representations is supervised one which requires a huge amount of human annotations to train the model. This paper presents a self-supervised approach which learns visual representations from input images without human annotations. We learn the correct arrangement of object proposals to represent an image using a convolutional neural network (CNN) without any manual annotations. We hypothesize that the network trained for solving this problem requires the embedding of semantic visual representations. Unlike existing approaches that use uniformly sampled patches, we relate object proposals that contain prominent objects and object parts. More specifically, we discover the representation that considers overlap, inclusion, and exclusion relationship of proposals as well as their relative position. This allows focusing on potential objects and parts rather than on clutter. We demonstrate that our model outperforms existing self-supervised learning methods and can be used as a generic feature extractor by applying it to object detection, classification, action recognition, image retrieval, and semantic matching tasks.",
author = "Changjae Oh and Bumsub Ham and Hansung Kim and Adrian Hilton and Kwanghoon Sohn",
year = "2019",
month = "7",
day = "1",
doi = "10.1016/j.eswa.2019.01.073",
language = "English",
volume = "125",
pages = "281--292",
journal = "Expert Systems with Applications",
issn = "0957-4174",
publisher = "Elsevier Limited",

}

OCEAN : Object-centric arranging network for self-supervised visual representations learning. / Oh, Changjae; Ham, Bumsub; Kim, Hansung; Hilton, Adrian; Sohn, Kwanghoon.

In: Expert Systems with Applications, Vol. 125, 01.07.2019, p. 281-292.

Research output: Contribution to journalArticle

TY - JOUR

T1 - OCEAN

T2 - Object-centric arranging network for self-supervised visual representations learning

AU - Oh, Changjae

AU - Ham, Bumsub

AU - Kim, Hansung

AU - Hilton, Adrian

AU - Sohn, Kwanghoon

PY - 2019/7/1

Y1 - 2019/7/1

N2 - Learning visual representations plays an important role in computer vision and machine learning applications. It facilitates a model to understand and perform high-level tasks intelligently. A common approach for learning visual representations is supervised one which requires a huge amount of human annotations to train the model. This paper presents a self-supervised approach which learns visual representations from input images without human annotations. We learn the correct arrangement of object proposals to represent an image using a convolutional neural network (CNN) without any manual annotations. We hypothesize that the network trained for solving this problem requires the embedding of semantic visual representations. Unlike existing approaches that use uniformly sampled patches, we relate object proposals that contain prominent objects and object parts. More specifically, we discover the representation that considers overlap, inclusion, and exclusion relationship of proposals as well as their relative position. This allows focusing on potential objects and parts rather than on clutter. We demonstrate that our model outperforms existing self-supervised learning methods and can be used as a generic feature extractor by applying it to object detection, classification, action recognition, image retrieval, and semantic matching tasks.

AB - Learning visual representations plays an important role in computer vision and machine learning applications. It facilitates a model to understand and perform high-level tasks intelligently. A common approach for learning visual representations is supervised one which requires a huge amount of human annotations to train the model. This paper presents a self-supervised approach which learns visual representations from input images without human annotations. We learn the correct arrangement of object proposals to represent an image using a convolutional neural network (CNN) without any manual annotations. We hypothesize that the network trained for solving this problem requires the embedding of semantic visual representations. Unlike existing approaches that use uniformly sampled patches, we relate object proposals that contain prominent objects and object parts. More specifically, we discover the representation that considers overlap, inclusion, and exclusion relationship of proposals as well as their relative position. This allows focusing on potential objects and parts rather than on clutter. We demonstrate that our model outperforms existing self-supervised learning methods and can be used as a generic feature extractor by applying it to object detection, classification, action recognition, image retrieval, and semantic matching tasks.

UR - http://www.scopus.com/inward/record.url?scp=85061306729&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85061306729&partnerID=8YFLogxK

U2 - 10.1016/j.eswa.2019.01.073

DO - 10.1016/j.eswa.2019.01.073

M3 - Article

AN - SCOPUS:85061306729

VL - 125

SP - 281

EP - 292

JO - Expert Systems with Applications

JF - Expert Systems with Applications

SN - 0957-4174

ER -