Multiple-choice questions, which are widely used in educational assessments, have the potential to negatively impact student learning and skew analytics when they contain item-writing flaws. Existing methods for evaluating multiple-choice questions in educational contexts tend to focus primarily on machine readability metrics, such as grammar, syntax, and formatting, without considering the intended use of the questions within course materials and their pedagogical implications. In this study, we present the results of crowdsourcing the evaluation of multiple-choice questions based on 15 common item-writing flaws. Through analysis of 80 crowdsourced evaluations on questions from the domains of calculus and chemistry, we found that crowdworkers were able to accurately evaluate the questions, matching 75% of the expert evaluations across multiple questions. They were able to correctly distinguish between two levels of Bloom's Taxonomy for the calculus questions, but were less accurate for chemistry questions. We discuss how to scale this question evaluation process and the implications it has across other domains. This work demonstrates how crowdworkers can be leveraged in the quality evaluation of educational questions, regardless of prior experience or domain knowledge.