In multilevel research on classroom instruction, individual student ratings are often aggregated to the class level in order to obtain a representative indicator of the classroom construct under study. Whether students within a class provide ratings consistent enough to justify aggregation, however, has not been the object of much research. Drawing on data from N = 9524 Students from 391 classes who participated in the national extension to the PISA 2006 study in Germany, the interrater reliability and interrater agreement of student ratings of science instruction were examined. Results showed that students within a class tended to accurately and reliably rate various aspects of their science lessons. However, agreement among ratings was influenced by class size, learning time, school track, and science performance. In multiple regression analyses, science performance turned out to be of particular importance in accounting for differences in the homogeneity of ratings. The findings suggest chat agreement among students' perceptions of instruction should be a central consideration for researchers using aggregated measures to examine classroom reaching.