Deep convolutional neural networks (CNNs) have shown revolutionary performance improvements for matching cost computation in stereo matching. However, conventional CNN-based approaches to learn the network in a supervised manner require a large number of ground-truth disparity maps, which limits their applicability. To overcome this limitation, we present a novel framework to learn a CNNs architecture for matching cost computation in an unsupervised manner. Our method leverages an image domain learning combined with stereo epipolar constraints. Exploiting the correspondence consistency between stereo images as supervision, our method selects the training samples in each iteration during network training and uses them to learn the network. To boost the performance, we also propose a multi-scale cost computation scheme. Experimental results show that our method outperforms the state-of-the-art methods including even supervised learning based methods on various benchmarks.