The emotional design hypothesis states that features in digital learning materials can elicit emotional reactions. Voice-related cues, such as enthusiastic accentuations are getting more into the focus of research. These emotion related cues have to be additionally processed to the relevant instructions while learning, so that the overall amount of to-be-processed information might be a moderator of the effect of emotional cues. One hundred eighteen participants were assigned to one cell of a two (enthusiasm of a pedagogical agent: enthusiastic vs. neutral) x two (mental load in the working memory of the learner: high vs. low) factorial between-subjects design. Regarding the multiple-choice learning test, results revealed that learners with a neutral agent voice performed better in a high load condition, while learners exposed to an enthusiastic voice reached higher learning scores in a low load condition. However, this moderating effect of mental load could not be shown for open answer questions.