After removing two outliers, the correlation between scores on exam 1 and 2 in the class I’m teaching was 0.24. This seems surprisingly low to me; has anyone with more experience seen similar numbers before? The correlation between p-set scores and each of the exams separately is higher (about 0.3); the correlation between p-set scores and the average of the two exam scores goes up (to 0.35), which is not surprising to me.
This is not a completely silly question, since exam statistics do reveal something about exam construction and grading (if not necessarily about the quality of teaching and learning going on). The most common example of this I’m aware of is bimodality: it’s extremely common for math exams to have bimodal score distributions. This is because a typical first draft of a math exam contains questions from a variety of areas, but at a relatively uniform level of difficulty (nothing super hard, nothing really easy). Thus, one tends to pick up a signal associated with the ability to solve math questions at a given difficulty level, independent of subject matter, in addition to the signal from content mastery. Giving questions at a wider variety of difficulty levels tends to smush the two peaks together. But I don’t know a similar story for inter-exam correlation.