This paper presents a study investigating the viewing behavior of human subjects for multimedia contents containing both the audio and video channels when the contents are corrupted by video transmission errors. It considers the human perceptual mechanism in realistic multimedia delivery applications over networks. We design an eye-tracking experiment using several high definition audio-visual contents having a wide range of content characteristics. The results are analyzed in terms of the amount of attention that each region among the sound source region, the region corrupted by packet loss artifacts, and the rest receives under two different audio conditions, i.e., with or without the audio channel. The results show that the effect of the audio channel on the gaze pattern toward packet loss artifacts varies with the contents. In addition, interesting observations such as temporal variations, observer dependence, and content dependence are reported.