Multi-Task Self-Supervised Visual Representation Learning for Monocular Road Segmentation

Laehoon Cho, Youngjung Kim, Hyungjoo Jung, Changjae Oh, Jaesung Youn, Kwanghoon Sohn

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Training deep networks commonly follows the supervised learning paradigm, which requires large-scale semantically-labeled data. The construction of such dataset is one of the major challenges when approaching to Advanced Driver Assistance Systems (ADAS) due to the expense of human annotation. In this paper, we explore whether unsupervised stereo-based cues can be used to learn high-level semantics for monocular road detection. Specifically, we estimate drivable space and surface normals from stereo images, which are used for pseudo ground-truth to train a convolutional neural network (CNN) as a multi-task learning scheme. Combining these multiple self-supervision tasks enables CNN to jointly encode the knowledge of obstacle and ground-plane into a single frame. We demonstrate that the feature representation learned by our multi-task approach synergistically provides a rich knowledge about geometrical characteristics. Experiments on the KITTI road dataset show that our representation outperforms state-of-the-art road detection approaches.

Original languageEnglish
Title of host publication2018 IEEE International Conference on Multimedia and Expo, ICME 2018
PublisherIEEE Computer Society
ISBN (Electronic)9781538617373
DOIs
Publication statusPublished - 2018 Oct 8
Event2018 IEEE International Conference on Multimedia and Expo, ICME 2018 - San Diego, United States
Duration: 2018 Jul 232018 Jul 27

Publication series

NameProceedings - IEEE International Conference on Multimedia and Expo
Volume2018-July
ISSN (Print)1945-7871
ISSN (Electronic)1945-788X

Conference

Conference2018 IEEE International Conference on Multimedia and Expo, ICME 2018
CountryUnited States
CitySan Diego
Period18/7/2318/7/27

Fingerprint

Advanced driver assistance systems
Neural networks
Supervised learning
Semantics
Experiments

All Science Journal Classification (ASJC) codes

  • Computer Networks and Communications
  • Computer Science Applications

Cite this

Cho, L., Kim, Y., Jung, H., Oh, C., Youn, J., & Sohn, K. (2018). Multi-Task Self-Supervised Visual Representation Learning for Monocular Road Segmentation. In 2018 IEEE International Conference on Multimedia and Expo, ICME 2018 [8486472] (Proceedings - IEEE International Conference on Multimedia and Expo; Vol. 2018-July). IEEE Computer Society. https://doi.org/10.1109/ICME.2018.8486472
Cho, Laehoon ; Kim, Youngjung ; Jung, Hyungjoo ; Oh, Changjae ; Youn, Jaesung ; Sohn, Kwanghoon. / Multi-Task Self-Supervised Visual Representation Learning for Monocular Road Segmentation. 2018 IEEE International Conference on Multimedia and Expo, ICME 2018. IEEE Computer Society, 2018. (Proceedings - IEEE International Conference on Multimedia and Expo).
@inproceedings{84be212ac4934364b9363f735f9c78d6,
title = "Multi-Task Self-Supervised Visual Representation Learning for Monocular Road Segmentation",
abstract = "Training deep networks commonly follows the supervised learning paradigm, which requires large-scale semantically-labeled data. The construction of such dataset is one of the major challenges when approaching to Advanced Driver Assistance Systems (ADAS) due to the expense of human annotation. In this paper, we explore whether unsupervised stereo-based cues can be used to learn high-level semantics for monocular road detection. Specifically, we estimate drivable space and surface normals from stereo images, which are used for pseudo ground-truth to train a convolutional neural network (CNN) as a multi-task learning scheme. Combining these multiple self-supervision tasks enables CNN to jointly encode the knowledge of obstacle and ground-plane into a single frame. We demonstrate that the feature representation learned by our multi-task approach synergistically provides a rich knowledge about geometrical characteristics. Experiments on the KITTI road dataset show that our representation outperforms state-of-the-art road detection approaches.",
author = "Laehoon Cho and Youngjung Kim and Hyungjoo Jung and Changjae Oh and Jaesung Youn and Kwanghoon Sohn",
year = "2018",
month = "10",
day = "8",
doi = "10.1109/ICME.2018.8486472",
language = "English",
series = "Proceedings - IEEE International Conference on Multimedia and Expo",
publisher = "IEEE Computer Society",
booktitle = "2018 IEEE International Conference on Multimedia and Expo, ICME 2018",
address = "United States",

}

Cho, L, Kim, Y, Jung, H, Oh, C, Youn, J & Sohn, K 2018, Multi-Task Self-Supervised Visual Representation Learning for Monocular Road Segmentation. in 2018 IEEE International Conference on Multimedia and Expo, ICME 2018., 8486472, Proceedings - IEEE International Conference on Multimedia and Expo, vol. 2018-July, IEEE Computer Society, 2018 IEEE International Conference on Multimedia and Expo, ICME 2018, San Diego, United States, 18/7/23. https://doi.org/10.1109/ICME.2018.8486472

Multi-Task Self-Supervised Visual Representation Learning for Monocular Road Segmentation. / Cho, Laehoon; Kim, Youngjung; Jung, Hyungjoo; Oh, Changjae; Youn, Jaesung; Sohn, Kwanghoon.

2018 IEEE International Conference on Multimedia and Expo, ICME 2018. IEEE Computer Society, 2018. 8486472 (Proceedings - IEEE International Conference on Multimedia and Expo; Vol. 2018-July).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Multi-Task Self-Supervised Visual Representation Learning for Monocular Road Segmentation

AU - Cho, Laehoon

AU - Kim, Youngjung

AU - Jung, Hyungjoo

AU - Oh, Changjae

AU - Youn, Jaesung

AU - Sohn, Kwanghoon

PY - 2018/10/8

Y1 - 2018/10/8

N2 - Training deep networks commonly follows the supervised learning paradigm, which requires large-scale semantically-labeled data. The construction of such dataset is one of the major challenges when approaching to Advanced Driver Assistance Systems (ADAS) due to the expense of human annotation. In this paper, we explore whether unsupervised stereo-based cues can be used to learn high-level semantics for monocular road detection. Specifically, we estimate drivable space and surface normals from stereo images, which are used for pseudo ground-truth to train a convolutional neural network (CNN) as a multi-task learning scheme. Combining these multiple self-supervision tasks enables CNN to jointly encode the knowledge of obstacle and ground-plane into a single frame. We demonstrate that the feature representation learned by our multi-task approach synergistically provides a rich knowledge about geometrical characteristics. Experiments on the KITTI road dataset show that our representation outperforms state-of-the-art road detection approaches.

AB - Training deep networks commonly follows the supervised learning paradigm, which requires large-scale semantically-labeled data. The construction of such dataset is one of the major challenges when approaching to Advanced Driver Assistance Systems (ADAS) due to the expense of human annotation. In this paper, we explore whether unsupervised stereo-based cues can be used to learn high-level semantics for monocular road detection. Specifically, we estimate drivable space and surface normals from stereo images, which are used for pseudo ground-truth to train a convolutional neural network (CNN) as a multi-task learning scheme. Combining these multiple self-supervision tasks enables CNN to jointly encode the knowledge of obstacle and ground-plane into a single frame. We demonstrate that the feature representation learned by our multi-task approach synergistically provides a rich knowledge about geometrical characteristics. Experiments on the KITTI road dataset show that our representation outperforms state-of-the-art road detection approaches.

UR - http://www.scopus.com/inward/record.url?scp=85061427175&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85061427175&partnerID=8YFLogxK

U2 - 10.1109/ICME.2018.8486472

DO - 10.1109/ICME.2018.8486472

M3 - Conference contribution

AN - SCOPUS:85061427175

T3 - Proceedings - IEEE International Conference on Multimedia and Expo

BT - 2018 IEEE International Conference on Multimedia and Expo, ICME 2018

PB - IEEE Computer Society

ER -

Cho L, Kim Y, Jung H, Oh C, Youn J, Sohn K. Multi-Task Self-Supervised Visual Representation Learning for Monocular Road Segmentation. In 2018 IEEE International Conference on Multimedia and Expo, ICME 2018. IEEE Computer Society. 2018. 8486472. (Proceedings - IEEE International Conference on Multimedia and Expo). https://doi.org/10.1109/ICME.2018.8486472