3D scene modeling is a challenging problem and has been one of the most important research topic for many years. In this paper, we describe the 3D scene reconstruction system that creates 3D models with multiple stereo image pairs acquired by hand-held device. Our algorithm consists of the following two steps, which is depth reconstruction and model registration. In the first part, we obtain the depth map with stereo matching and camera geometry in each view. The algorithm is based on adaptive window methods in hierarchical frameworks. In the second part, we use SIFT feature to estimate the camera motion. LMedS algorithm reduces the effect of outliers in this process. Experimental results show that the proposed algorithm provides accurate disparity map in various types of images, and the 3D model of real world's scene.