Validity of comparative judgement to assess academic writing: examining implications of its holistic character and building on a shared consensus
Tine van Daal, Marije Lesterhuis, Liesje Coertjens, Vincent Donche en Sven De Maeyer
Assessment in Education: Principles, Policy & Practice, 2016
Recently, comparative judgement has been introduced as an alternative method for scoring essays. Although this method is promising in terms of obtaining reliable scores, empirical evidence concerning its validity is lacking. The current study examines implications resulting from two critical assumptions underpinning the use of comparative judgement, namely: its holistic characteristic and how the final rank order reflects the shared consensus on what makes for a good essay. Judges’ justifications that underpin their decisions are qualitatively analysed to obtain insight into the dimensions of academic writing they take into account. The results show that most arguments are directly related to the competence description. However, judges also use their expertise in order to judge the quality of essays. Additionally, judges differ in terms of how they conceptualise writing quality, and regarding the extent to which they tap into their own expertise. Finally, this study explores diverging conceptualisation of misfitting judges.
Exploring construct validity within comparative judgement: a case of argumentative writing
Marije Lesterhuis, Vincent Donche, Sven De Maeyer, Tine van Daal, Roos Van Gasse en Liesje Coertjens
Ingediend bij Language Assessment Quarterly
Comparative judgement has recently been introduced to the domain of writing assessment, wherein the assessors are encouraged to use their own conceptions of quality while judging. A set of assessors all make a series of text comparisons, which results in a scale that represents a rich construct of the different conceptions of quality of texts among assessors. This assumption of construct validity is tested in this study, by means of a case of argumentative writing. The reasons for choosing one text over the other are analysed content-wise to investigate key assumptions of the use of assessors’ own conceptions of quality and the enhancement of construct representation by involving a group of assessors. The results show that assessors refer to a wide range of features of texts, which implies that the final scale represents a richer and more varied construct of the writing construct compared to what was suggested by the competence description. However, aspects of writing are addressed in unequal shares. Both have implications for its construct validity. This explorative study on comparative judgement shows how judgement decisions can be studied in-depth, and how these insights can support the valid interpretation and use of scales derived from comparative judgement.