While numerous algorithms have been proposed for object tracking with demonstrated success, it remains a challenging problem for a tracker to handle large change in scale, motion, shape deformation with occlusion. One of the main reasons is the lack of effective image representation to account for appearance variation. Most trackers use high-level appearance structure or low-level cues for representing and matching target objects. In this paper, we propose a tracking method from the perspective of mid-level vision with structural information captured in superpixels. We present a discriminative appearance model based on superpixels, thereby facilitating a tracker to distinguish the target and the background with mid-level cues. The tracking task is then formulated by computing a target-background confidence map, and obtaining the best candidate by maximum a posterior estimate. Experimental results demonstrate that our tracker is able to handle heavy occlusion and recover from drifts. In conjunction with online update, the proposed algorithm is shown to perform favorably against existing methods for object tracking.