FOSNet: An end-to-end trainable deep neural network for scene recognition

Hongje Seong, Junhyuk Hyun, Euntai Kim

Research output: Contribution to journalArticlepeer-review

25 Citations (Scopus)


Scene recognition is a kind of image recognition problems which is aimed at predicting the category of the place at which the image is taken. In this paper, a new scene recognition method using the convolutional neural network (CNN) is proposed. The proposed method is based on the fusion of the object and the scene information in the given image and the CNN framework is named as FOS (fusion of object and scene) Net. To combine the object and the scene information effectively, a new fusion framework named CCG (correlative context gating) is proposed. In addition, a new loss named scene coherence loss (SCL) is developed to train the FOSNet and to improve the scene recognition performance. The proposed SCL is based on the idea that the scene class does not change all over the image. The proposed FOSNet was experimented with three most popular scene recognition datasets, and their state-of-the-art performance is obtained in two sets: 60.14% on Places 2 and 90.30% on MIT indoor 67. The second highest performance of 77.28% is obtained on SUN 397.

Original languageEnglish
Article number9076601
Pages (from-to)82066-82077
Number of pages12
JournalIEEE Access
Publication statusPublished - 2020

Bibliographical note

Funding Information:
This work was supported by the Next-Generation Information Computing Development Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Science, ICT under Grant NRF-2017M3C4A7069370.

Publisher Copyright:
© 2013 IEEE.

All Science Journal Classification (ASJC) codes

  • Computer Science(all)
  • Materials Science(all)
  • Engineering(all)


Dive into the research topics of 'FOSNet: An end-to-end trainable deep neural network for scene recognition'. Together they form a unique fingerprint.

Cite this