Training deep networks commonly follows the supervised learning paradigm, which requires large-scale semantically-labeled data. The construction of such dataset is one of the major challenges when approaching to Advanced Driver Assistance Systems (ADAS) due to the expense of human annotation. In this paper, we explore whether unsupervised stereo-based cues can be used to learn high-level semantics for monocular road detection. Specifically, we estimate drivable space and surface normals from stereo images, which are used for pseudo ground-truth to train a convolutional neural network (CNN) as a multi-task learning scheme. Combining these multiple self-supervision tasks enables CNN to jointly encode the knowledge of obstacle and ground-plane into a single frame. We demonstrate that the feature representation learned by our multi-task approach synergistically provides a rich knowledge about geometrical characteristics. Experiments on the KITTI road dataset show that our representation outperforms state-of-the-art road detection approaches.