Public area is one of the most significant places which need video surveillance. However, pixel-wise adaptive background subtraction methods are disturbed by incessantly passing or temporally staying foreground due to its adaptability. In such an environment, even the initialization of background is not free from the influence of foregrounds. If the adaptability is modified carelessly for selective learning, the stability of the background model will be damaged. Adjusting or fusing the learning rate slows down the false learning rate but cannot solve the problems. In this paper, we present a multilayer background modeling algorithm for public area surveillance. We efficiently cluster regions in object-wise using spatiotemporal cohesion together with spectral similarity by comparing inputs with background layer. And we classify the clustered regions and update the multi-layer model according to the results. Using the PETS data, we show that the proposed method not only maintain the background robustly but also initialize background with stationary object detection in crowded public area.