Fork me on GitHub

Algemeen

Comparative Judgement as a promising alternative to score competences

Marije Lesterhuis, San Verhavert, Liesje Coertjens, Sven De Maeyer en Vincent Donche

 

Hoofdstuk in Innovative Practices for Higher Education Assessment and Measurement, 2017

Cano & Ion

 

ABSTRACT

To adequately assess students’ competences, students are asked to provide proof of a performance. Ideally, open and real-life tasks are used for such performance assessment. However, to augment the reliability of the scores resulting from performance assessment, assessments are mostly standardised. This hampers the validity of the performance assessment. Comparative judgement (CJ) is introduced as an alternative judging method that does not require standardisation of tasks. The CJ method is based on the assumption that people are able to compare two performances more easily and reliable than assigning a score to a single one. This chapter provides insight in the method and elaborates on why this method is promising to generate valid, reliable measures in an efficient way, especially for large-scale summative assessments. Thereby, this chapter brings together the research already conducted in this new assessment domain.

 

Competenties kwaliteitsvol beoordelen met D-PAC

Sven De Maeyer, Renske Bouwer, Roos Van Gasse en Maarten Goossens

 

Examens, 2017

Xamen

 

ABSTRACT

Eind jaren 70 deed competentiegericht onderwijs zijn intrede in het onderwijs. In deze onderwijsvorm gaat het om de integratie van kennis, vaardigheden en houdingen en het kunnen toepassen hiervan in betekenisvolle en authentieke situaties. Het doel is om mensen beter voor te bereiden op de snel veranderende maatschappij. Een veelgehoorde kritiek op het competentiegericht onderwijs is echter dat de evaluatiemethodes blijven hangen op het niveau van eenvoudig te meten kennisvragen en geen rekening houden met de context waarin de competentie tot uiting komt. Op deze manier schiet competentiegericht onderwijs zijn doel voorbij. Want hoe weet je hoe competenties zich ontwikkelen als je ze niet op de juiste manier meet? In dit artikel gaan we in op de belangrijkste struikelblokken bij het beoordelen van competenties en laten we zien hoe D-PAC een veelbelovend alternatief is.

 

Competenties kwaliteitsvol beoordelen: brengt een comparatieve aanpak soelaas?

Marije Lesterhuis, Vincent Donche, Sven De Maeyer, Tine van Daal, Roos Van Gasse, Liesje Coertjens, Anneleen Mortier, Tanguy Coenen, Peter Vlerick, Jan Vanhoof en Peter Van Petegem

 

Tijdschrift voor Hoger Onderwijs, 2015

TvHO

 

ABSTRACT

In het hoger onderwijs wordt steeds vaker met performance assessments gewerkt om competenties van studenten te evalueren. Docenten worstelen echter met hoe ze deze performance assessments het beste kunnen scoren. Meestal wordt hier een combinatie van criterialijsten en holistisch scoren voor gebruikt. Deze methode leidt echter niet altijd tot betrouwbare resultaten. Ook treden er problemen op met de validiteit, doordat in deze methode de competentie niet als geheel benaderd wordt. In dit artikel wordt ingegaan op deze problematiek en wordt een alternatieve benadering voorgesteld, de comparatieve beoordelingsmethode (CB). In CB wordt aan beoordelaars gevraagd prestaties van studenten te vergelijken en aan te geven welke het best presteert in termen van de te beoordelen competentie. Door meerdere vergelijkingen op te lossen kan een rangorde gegenereerd worden van beste naar minst goede prestatie. Betrouwbaarheid wordt nagestreefd door meerdere beoordelaars prestaties meerdere keren te laten beoordelen. Daarnaast stelt de methode meer valide te zijn, omdat de beoordelaars hun keuze baseren op basis van een holistische evaluatie van de prestaties en de taak voor studenten kan meer open geformuleerd worden. Alhoewel onderzoek laat zien dat het een veelbelovende alternatieve methode is, is meer onderzoek noodzakelijk naar de betrouwbaarheid, validiteit en praktische haalbaarheid.

 

The reliability and validity of peer assessment based on comparative judgements

Maarten Goossens, Renske Bouwer en Sven De Maeyer

 

Congrespaper op Assessment in Higher Education (Manchester, 2017)

AHE

 

ABSTRACT

Peer assessment (PA) is considered to be an effective instructional practice to promote students’ learning (Dochy, Segers, & Sluijsmans, 1999). Although, creating an effective PA is a difficult task. To overcome this difficulty, comparative judgement (CJ) was introduced in PA (Jones & Alcock, 2014). In CJ, performances are not provided with scores, but pairs of performances are compared and the best of each pair is selected in a holistic manner. Based on the pairwise comparisons of multiple assessors, performances can be ranked on a scale from low to high quality (Pollitt, 2012). This process of comparing is considered to be an easier task than assigning scores to a single object (Thurstone, 1927) making CJ suitable for PA’s. The present study combines two PA studies in different contexts. The first study covers a PA of Entity-Relationship models (engineering), the second entails a PA of mood boards (architecture), both at the Antwerp University. The research questions of both studies are: 1) To what extent are students able to generate a reliable rank-order using CJ? 2) To what extent do students generate a similar rank-order to their teachers using CJ? Results show students to be capable of creating reliable rank-orders in both contexts. In the case of the mood boards the group of students (N=38) was divided in two creating each a separate rank-order (SSR= .79 and .75). Also a group of tutors (N=5) created a rank-order (SSR= .76). In the case of the Entity-Relationship models students (N=27) produced a rank-order with a reliability of .79. Likewise, tutors (N=4) also generated a rank-order (SSR= .76). To answer the second RQ, spearman rank correlations between rank-orders of the students and the tutors were calculated. Resulting in case of the mood boards in a ρs= .59 (p<.001) and ρs= .58 (p<.001) and in case of the Entity-Relationship models in a ρs= .58 (p<.001).We can conclude that students are capable of generating reliable rank-orders using CJ. However students do not generate similar rank-orders to their tutors. Probably students value different aspects of the performances compared to the tutors, or students are easier distracted by prompting characteristics of the performances. Nevertheless, CJ can have its affordances for peer assessment as Pachur and Olsson (2012) states that comparing pairs of items is a more effective learning strategy than comparing individual items against criteria.

 

The reliability and validity of comparative judgments with D-PAC, a Digital Platform for the Assessment of Competences

Renske Bouwer, Maarten Goossens, San verhavert en Sven De Maeyer

 

Poster op Assessment in Higher Education (Manchester, 2017)

AHE

 

ABSTRACT

Performance assessments are considered to be a valid way of assessing competences of students in higher education. In performance assessments, students are asked to perform in an authentic setting, for instance by creating a written product or giving a presentation. As such performances can differ greatly between students, the evaluation is a difficult task. Even when teachers use rubrics with predefined criteria, there are large differences in how they mark students’ performances (Sadler, 2009). To improve the way performance assessments are evaluated, researchers recently developed a Digital Platform for the Assessment of Competencies (D-PAC). In D-PAC teachers do not provide scores for students’ performances, but they compare pairs of performances and select the best of each pair in a holistic manner. This process of comparative judgement is considered to be easier than assigning scores to single objects (Thurstone, 1927). Based on the pairwise comparisons of multiple assessors, performances can be ranked on a scale from low to high quality (Pollitt, 2012). D-PAC further allows assessors to provide specific feedback to students. So far, D-PAC was implemented in 49 user groups for the evaluation of performance assessments in a wide variety of domains, e.g., for writing ability, self-reflection, and problem-solving. Assessments included performances in different formats, e.g., texts, pictures, audio, or video. On average, a try-out involved 65 assessees (min=6 max=201) and 24 assessors (min=4, max=93). Assessors were either teachers or peers. Pairs were automatically generated by a distributed random algorithm. A meta-analysis of the results showed that D-PAC ratings were relatively stable across assessors, resulting in rank-orders with an average reliability between .70 and .80, irrespective whether the assessors were teachers or peers. The reliability of the rank-order depended on the number of comparisons that were made. In particular, for a reliability of .70, each performance had to be compared at least nine times with another performance. This is higher than the average reliability that can be attained with analytic ratings, even when raters invest the same rating time per paper (cf. Coertjens, Lesterhuis, Verhavert, & De Maeyer, 2016). Further, the validity of the ranking is warranted as it includes judgements of multiple assessors who all have their own perspective of what a quality performance looks like. Students were satisfied with the feedback they received from D-PAC. In sum, D-PAC is a credible tool for the assessment of competences, supporting assessors to make reliable and valid judgements without markings.

 

An information system design theory for the comparative judgement of competences

Tanguy Coenen et al

 

ingediend bij European Journal of Information Systems, 2017

EJIS