In this paper, we present a robust framework for facial pose estimation from binocular stereoscopic vision. Unlike prior work on the facial pose estimation that employs the whole landmarks even located in the wrong position, we propose a landmark selection method to remove the erroneous landmarks for better performance, especially in the large facial pose case. For this purpose, we train a convolutional neural network (CNN) in order to measure the confidence of each facial landmark detected by using a well-known landmark detection algorithm. Also, by fitting selected landmarks to 3D space, our framework becomes more robust even when a small number of landmarks are selected. Due to the absence of public dataset for the binocular stereo facial pose, we construct facial pose data sets using a motion sensor for performance validation. In our experiments, our method achieves the higher accuracy of the pose estimation than the previous method, especially for large facial pose cases.