Korean high school students (Experiment 1) and college students (Experiment 2a) received a 16‐minute lesson on Antarctica that consisted of English audio only (audio group) or English audio with corresponding video depicting the scenes and objects described in the audio (audio + video group). The audio + video group scored significantly (d = 0.33 in Experiment 1) or marginally higher (d = 0.42 in Experiment 2a) than the audio group on a subsequent comprehension test. The mean difficulty rating of the audio + video group was significantly less than that of the audio group (d = 0.62 in Experiment 1 and d = 0.96 in Experiment 2a); the mean effort rating of the audio + video group was significantly greater than that of the audio group (d = 0.60 in Experiment 1 and d = 0.79 in Experiment 2a). When the audio was in Korean, comprehension scores of college students did not benefit from added video (d = −0.03 in Experiment 2b). Copyright © 2015 John Wiley & Sons, Ltd.