Background modeling and subtraction is a fundamental research topic in computer vision. Pixel-level background model uses a Gaussian mixture model (GMM) or kernel density estimation to represent the distribution of each pixel value. Each pixel will be process independently and thus is very efficient. However, it is not robust to noise due to sudden illumination changes. Region-based background model uses local texture information around a pixel to suppress the noise but is vulnerable to periodic changes of pixel values and is relatively slow. A straightforward combination of the two cannot maintain the advantages of the two. This paper proposes a real-time integration based on robust estimator. Recent efficient minimum spanning tree based aggregation technique is used to enable robust estimators like M-smoother to run in real time and effectively suppress the noisy background estimates obtained from Gaussian mixture models. The refined background estimates are then used to update the Gaussian mixture models at each pixel location. Additionally, optical flow estimation can be used to track the foreground pixels and integrated with a temporal M-smoother to ensure temporally-consistent background subtraction. The experimental results are evaluated on both synthetic and real-world benchmarks, showing that our algorithm is the top performer.