Environment recognition has been an important topic ever since the emergence of augmented reality (AR). For better experience in AR applications, environment recognition should be provided fast in real-time, where real-time object detection technologies could fulfill this requirement. However, training object detectors for AR specific scenarios are often troublesome. The real-time nature of AR produces visual degradations such as motion blur or occlusion by interaction, which make detectors trained with plain data difficult to detect objects exposed in such complex situations. Also, since gathering and labeling training data from scratch is a heavy burden, we need to resort to synthesized training data but previous synthetic data generation frameworks do not consider the aforementioned issue. Therefore, in this paper, we propose a new synthetic data generation framework which includes visual variations such as motion blur and occlusion occurred by distractors. By this simple modification, we show that including such variated data to the training dataset could dramatically improve realtime performance of object detectors by a high margin. Also, we stress that synthesizing training data with no more than three objects per image can achieve competitive performance compared to detectors trained with over four present in a single image. Experimental results both quantitatively and qualitatively supports our statements and shows the superiority of our method.