Training deep networks commonly follows the supervised learning paradigm, which requires large-scale semantically-labeled data. The construction of such dataset is one of the major challenges when approaching to Advanced Driver Assistance Systems (ADAS) due to the expense of human annotation. In this paper, we explore whether unsupervised stereo-based cues can be used to learn high-level semantics for monocular road detection. Specifically, we estimate drivable space and surface normals from stereo images, which are used for pseudo ground-truth to train a convolutional neural network (CNN) as a multi-task learning scheme. Combining these multiple self-supervision tasks enables CNN to jointly encode the knowledge of obstacle and ground-plane into a single frame. We demonstrate that the feature representation learned by our multi-task approach synergistically provides a rich knowledge about geometrical characteristics. Experiments on the KITTI road dataset show that our representation outperforms state-of-the-art road detection approaches.
|Title of host publication||2018 IEEE International Conference on Multimedia and Expo, ICME 2018|
|Publisher||IEEE Computer Society|
|Publication status||Published - 2018 Oct 8|
|Event||2018 IEEE International Conference on Multimedia and Expo, ICME 2018 - San Diego, United States|
Duration: 2018 Jul 23 → 2018 Jul 27
|Name||Proceedings - IEEE International Conference on Multimedia and Expo|
|Conference||2018 IEEE International Conference on Multimedia and Expo, ICME 2018|
|Period||18/7/23 → 18/7/27|
Bibliographical noteFunding Information:
This research was supported by Next-Generation Information Computing Development Program through the National Research Foundation of Korea(NRF) funded by the Ministry of Science, ICT (NRF-2017M3C4A7069370). (†Corresponding author : email@example.com.)
© 2018 IEEE.
All Science Journal Classification (ASJC) codes
- Computer Networks and Communications
- Computer Science Applications