Hoe Academie feedback op kunstwerken losweekt via live app

‘Onze studenten zelf laten ervaren hoe bezoekers hun creaties ervaren, welke observaties zij maken, hoe zij hun werk gewaarworden; dat is het doel van onze Black Box-tentoonstelling’, zo vertelt Gio De Weerd, directeur van de Stedelijke Academie voor Schone Kunsten in Lier. Met onze online tool om competenties te beoordelen zag hij kansen om deze doelstelling op een innovatieve manier te realiseren. Bezoekers liepen met de app van D-PAC door de tentoonstelling. Ze vergeleken steeds twee kunstwerken met elkaar, kozen hun favoriet en gaven beide studenten feedback.

 

Tentoonstellen om te leren van het publiek

De kunstacademie in Lier geeft haar studenten jaarlijks de gelegenheid om werk te presenteren. Met deze Black Box-tentoonstelling wil men duidelijk meer dan een louter toonmoment organiseren. De school schakelt doelbewust bezoekers in om de studenten te laten ervaren hoe zij naar hun werk kijken. Voorheen enkel via feedback op steekkaarten die bezoekers anoniem in een zwarte doos konden droppen, nu ook – als experiment – via de live app van D-PAC.

 

Kunstwerken live vergelijken en online feedback geven

De school vond 14 bezoekers bereid om mee te werken aan hun experiment om studenten inzicht te geven in de beleving van een publiek. Eind januari gingen zij enthousiast op wandel langs het werk van 10 verschillende exposanten. Kris kras, geleid door hun smartphone of tablet waarop telkens twee willekeurige kunstwerken oplichtten. De opdracht daarbij was duidelijk: welk van beide kunstwerken vind jij het beste? Bezoekers duidden telkens een favoriet aan en gaven beide kunstwerken feedback. De opdrachten volgden elkaar op, waarbij eenzelfde werk ook meer dan eens vergeleken moest worden. Zo legden de bezoekers een heel individueel parcours van oeuvres af en hadden ze na afloop alle  kunstwerken gezien.

 

authentieke ervaring

De live app van D-PAC zorgt voor een authentieke ervaring bij het vergelijken en beoordelen van kunstwerken.

 

‘Kunst ervaren lukt niet aan de hand van foto’s alleen’

Het gebruik van de live applicatie betekende voor de school dat de bezoekers de look & feel van reële, tastbare fotografie, tekeningen en installaties konden beoordelen. De meeste bezoekers gingen de werken daadwerkelijk live monsteren in plaats van de foto’s op hun toestel te beoordelen. ‘Dat gaat immers niet aan de hand van foto’s alleen’, verduidelijkte een bezoeker.

 

Kunststudenten krijgen feedback over vakmanschap en kunstenaarschap

Studenten laten leren uit de feedback van bezoekers, daar streefde de Academie naar. De live app van D-PAC bleek hiervoor een ideaal kanaal. Bezoekers beoordeelden de kunstwerken uitgebreid op twee aspecten. Voor vakmanschap formuleerden bezoekers bij een bepaald schilderij bijvoorbeeld opmerkingen en tips over toonverloop, dieptezicht, kleurschakeringen, illusie van perspectief, enz. Ook in de feedback over kunstenaarschap school veel rijkdom. Sommige bezoekers gaven aan hoe het kunstwerk hen aan het denken zetten of mogelijkheden tot interpretatie bood, terwijl anderen reageerden op de vanzelfsprekendheid van een kunstwerk of de overdaad aan melancholie.

 

D-PAC zorgt voor gevarieerde feedback van meerdere beoordelaars

Wat D-PAC voor de Academie interessant maakt is dat elke bezoeker bij elk werk meer dan één observatie terugkoppelt. Eigen aan de tool is namelijk dat het elk werk meerdere keren, in verschillende vergelijkingen laat opduiken. Alsof de live app de bezoeker dwingt om bij elke nieuwe vergelijking van een bepaald werk toch weer een iets andere bril op te zetten. ‘Je kunt moeilijk elke keer dezelfde feedback geven bij een bepaald schilderij.’, verduidelijkt een bezoeker. Een beoordelaar die een werk meerdere keren te zien krijgt gaat telkens net weer iets anders of iets nieuws opvallen.

Daarnaast maakt de live app van D-PAC het mogelijk dat alle deelnemende kunstenaars feedback krijgen van alle bezoekers. De digitale tool dwingt beoordelaars immers als het ware elk werk te vergelijken met steeds een willekeurige andere en er feedback bij te noteren. ‘Dit was bij vorige edities van de Black Box tentoonstelling wel eens anders’, licht Gio als hoofd van de Academie toe. Sommige kunstenaars kregen wel eens meer feedback dan anderen. En dat is natuurlijk een gemiste kans.

 

àndere kijk (2)

De live app van D-PAC laat de kijker steeds op een andere manier naar het kunstwerk kijken. Hierdoor wordt de feedback voor de kunstenaar uiteindelijk rijker.

 

Lessen uit de Kunsten

Wat we zelf geleerd hebben uit deze eerste try-out met de live applicatie van D-PAC, is dat de bezoekers de opdracht om twee werken te vergelijken, een favoriet te kiezen en feedback te geven een goede techniek vonden. Goed, maar niet eenvoudig.

Wat het vergelijkend beoordelen moeilijk maakte, was dat:

  • de werken onderling nogal sterk verschilden wat betreft aard en uitwerking;
  • de oeuvres fysiek niet steeds dicht bijeen te vinden waren; 
  • de app nog onvoldoende perfect ontwikkeld is voor gebruik op een mobiel toestel.

Feedback die we beslist meepakken bij de volgende live toepassingen van D-PAC.

 

Zelf testen?

Ben jij betrokken bij een academie, conservatorium, museum of expositie en wil je graag zelf eens experimenteren met de live tool van D-PAC? Ben je nieuwsgierig naar de diverse toepassingsmogelijkheden voor jouw organisatie?

Neem dan snel contact op met ons:

 

Which video is better? D-PAC allows educators to assess videos in an easy and credible way

Different types of media such as video, audio, or images are increasingly used for the assessment of students’ competences. However, as they allow for a large variation in performance between students, the process of grading is rather difficult. The online tool D-PAC aims to support educators in the assessment of video and images.

 

In D-PAC, students can easily upload their work in any media type (text, audio, image, video), after which the work is presented in randomly selected pairs to the assessors. The only task for assessors is to choose which one of the two is best, using their own expertise. Assessors find it easy to make such comparative judgements because they are not forced to score each work on a (long) list of criteria. Each work is presented multiple times to multiple assessors, resulting in a scale in which students’ worked is ranked according to its quality. 

 

‘Working with D-PAC was really easy and fast.’

Ivan Waumans, KDG University College

Recently, D-PAC has been used in a Bachelor Multimedia and Communication Technology for the assessment of students’ animation skills. Students received an audio fragment of the radio play by ‘Het Geluidshuis’ and had to accompany it with animation. A group of 9 assessors evaluated the quality of the animations. The assessors differed in background and expertise: 3 people from Het Geluidshuis, 2 expert animators, 2 alumni students, and 2 teachers.

 

For Ivan Waumans, coordinator of the course, working with D-PAC was really easy and fast. ‘About 2 hours after I sent the login information to the assessors I got an email from one of them saying: Done!’ Assessors valued that they could do the evaluations from their homes or offices. Some assessors did all the comparisons in one session, whereas others spread their comparisons over a few days. None of them had any trouble using or understanding D-PAC. The only difficulty the assessors experienced was when they had to choose between 2 videos that were of equal quality. Ivan had to reassure them that it was OK to just pick one of them, because the tool generates the same ability score for videos of equal quality. Ability scores represent the likelihood that a particular video will win from others. Based upon these scores the tool provides a ranking order in which videos are ordered from poor to high quality. 

 

video
Assessors evaluated the quality of animations using pairwise comparisons in D-PAC

 

‘After explaining comparative judgement, students accepted their grade’

Ivan and his team assigned grades to the animations based upon the order and ability scores. As there were gaps between ability scores, the final grades were not equally distributed over the ranking order. For instance, the top 2 videos got 18/20 and 16/20. Teachers were happy with this more objective grading system. ‘When I look at certain videos and their grade, I notice that I would have given a higher or lower grade depending on my personal taste or the relation with the students’, Ivan explained. He experienced that by including external people in the evaluation, this bias was eliminated. There were only 2 students who were a bit disappointed about the grade they received. But after explaining the procedure of comparative judgement, they accepted their grade. The fact that 9 people contributed in ranking the videos, instead of only one teacher, convinced them the grade was fair.

 

More information

D-PAC allows educators to assess students’ performance in video or images in a more reliable and credible way, without increasing the workload of teachers.

Want to find out more? Send us an e-mail.

 

Media & Learning Newsletter

This blog has been published in the newsletter of Media & Learning:

Screen Shot 2017-03-09 at 11.20.47

Noorderburen waarderen onze expertise

Onder het motto ‘samen professionaliseren’ pikte hogeschool Zuyd (Heerlen, Sittard, Maastricht) het D-PAC-verhaal op. Geboeid door onze kennis en ervaring op vlak van professioneel beoordelen en peer-assessment, willen ze ook anderen binnen de hogeschool inspireren en stimuleren.

In samenwerking met Dominique Sluijsmans (lector Professioneel Beoordelen, Zuyd) en Judith van Hooijdonk (I-team, Zuyd) is begin deze week een blog over D-PAC als tool voor Technology Enhanced Learning (TEL) verschenen.

blog Zuyd 2

Op de blogpagina van ICT in Onderwijs en Onderzoek @ Zuyd staan overigens ook nog andere zeer interessante nieuwtjes.

Dankjewel I-team om met en voor ons te willen netwerken: benieuwd naar de reacties op deze boeiende blog.

Using D-PAC for CV-screening

Comparative judgement is nowadays predominantly used in the educational domain. The D-PAC team aims to explore CJ’s strengths beyond this realm, for example in the recruitment and selection domain. Therefore, we conducted a try-out investigating whether or not D-PAC was successful when applied to CV-screening. Consequently we partnered with Hudson (http://be.hudson.com – a human resources consultancy company) using a received job opening from a client. Forty-two CV’s were received and D-PAC was used with 7 assessors to compare the CV’s. Assessors also provided pairwise feedback to justify each choice. The main questions were related to reliability and validity: (1) how reliable is the D-PAC assessment on CV screening with expert assessors (if the assessment would be performed again, how strongly will the ranking resemble the current one)? And (2) do all assessors look at the same and relevant criteria of the CV’s in relation to the job ad (validity)?

Results show that the assessment reached a high reliability (SSR = .88 – see figure 1). In addition to this, this high reliability was already achieved at 14 rounds. Moreover, inspecting the cut-off of acceptable reliability (SSR =.70), this was already accomplished after 9 rounds. The time investment of the total assessment was 11.5 hours, including pairwise feedback. However, since high reliability was already attained early on (9 rounds), this timing can be drastically reduced to 5 hours. Moreover, this time investment is still an overestimation, since in reality assessors do not provide feedback on the CV’s. To give an indication: it takes about 73 seconds to read two CV’s and decide which one is more in line with the job. If assessors have to give feedback to justify their choice, time increases to 90 seconds for each pair. To summarize, attaining a reliability of .70 without providing any feedback results in a time investment of 5 and a half minutes for each CV.

SSR round Hudson
Figure 1: Reliability (=SSR) of the CV-screening assessment. In total, 23 rounds were performed. Blue lines indicate different reliability levels. Reliability of .80 achieved at 14 rounds. Reliability of .70 achieved at 9 rounds

Additionally, assessors’ arguments were analysed to inspect the validity of the assessment. The main discussed themes were ‘work- and job-experience’, ‘education’, ‘over qualification’ and ‘job hopping’. Two themes were recurrent in all 7 assessors’ arguments: work- and job-experience and education. One theme was only discussed by one assessor: ‘age’. The top arguments per assessor are represented in figure 2. Most striking is that relevant experience and the amount of experience were most frequently mentioned by every assessor. Additionally, job hopping was mentioned a lot by assessor 2.

argumenten hudson
Figure 2: Top arguments given by all 7 assessors.

Next, we investigated which CV’s were in the lowest or highest position in the ranking and what type of comments they mainly received. Here, we found that when assessors mentioned something about candidates’ experience (or the lack of it), this CV had a higher chance to be lower ranked. On the other hand, when assessors discussed about candidates’ education, general experience, over qualification, bilingualism, job-hopping and the given explanation of experience, CV’s were more likely to end up at the higher part of the ranking (see table 1).

Arguments Low ranking High ranking
Amount of experience 40 26
Education 18 35
General experience 1 22
Overqualified 0 6
Bilingualism 2 8
Job-hopping 2 9
Explanation experience 0 6

Table 1: Arguments which differ between CV’s at the lower part of the ranking and the higher part of the ranking

To summarize, this try-out shows many opportunities. Firstly, it indicates that D-PAC is usable in a recruitment and selection domain, showing high reliabilities in a short amount of time. In addition to this, time investment will be reduced in future similar assessments, increasing its efficiency. Secondly, regarding the validity, the analyses of the provided arguments indicates that recruiters share the focus on relevant experience for this job. Next to this, recruiters differ in emphasis, each recruiter imposes different emphases during the assessment, which is captured when using multiple assessors. This further underpins the logic of including multiple assessors during a cv screening process.

Peer assessment in D-PAC reduces workload for tutors!

A group of 91 students second bachelor of the University of Hasselt in the track physiotherapy had the following task at the end of this year:

    – They had to formulate a clinical research question based on their experience as a physiotherapist;
    – Then they searched for a relevant scientific paper and formulated an answer to the research question based on the article.
    – At last they had to evaluate the article and point out the strengths and weaknesses of their study.

Normally all these papers are evaluated by one or two tutors. These tutors judge the paper by giving a ‘passed’ or ‘failed’ and provided feedback. You can imagine, this results in a substantial workload, especially when more than one task per student needs to be marked.

The tutor was inspired by a presentation about the D-PAC project. At first, the tutor was a bit skeptic. However, the possibilities of the tool were tempting enough to conduct an experiment in which peers would have to judge and comment the papers using the D-PAC tool. Next to this, the tutors evaluate the papers on their traditional manner. Afterwards the judgments and feedback of the students could be compared with the judgement and the feedback of the tutors.

Based on the pairwise comparison data we calculated the Scale Separation Reliability (SSR) for the student evaluations. The SSR was .80 and can be seen as a very reliable scale. To achieve this, 91 students had made 910 comparisons in total, in other words, every paper was compared 20 times.

The feedback students provided was of high quality. The results of a survey conducted by the students supported this statement. Students perceived the D-PAC peer feedback as relevant, honest and legitimate. Because almost every assessor gave feedback on almost every paper they had to compare, each student received feedback of 15 à 20 peers. Students indicated this as an added value of the D-PAC method.

If we compare outcomes of the students’ assessment and the tutors’ pass/fail decisions, we see a high resemblance. As Figure 1 shows, 12 students were given a fail by the tutor (red dots) and they all are located on the left side of the rank order. We can conclude that students can, by using pairwise comparison, evaluate their peers papers as good as tutors on their traditional manner.

rank Joke

However, as you can see, some blue dots remain on the left hand side, meaning that they were judged by students to be of poor quality, whereas tutors considerd them passed. Therefore, the coming year, the tutor will check the 40% lowest ranked papers to verify whether they failed. As such, using this combination of peer review and feedback together with a final check by the tutor, the workload of the tutor is reduced by at least 60% while ensuring the quality of the decision and the feedback.

Testimonial professor architecture

The next film is a testimony of an architecture professor who used D-AC for a peer assessment of mood boards. Because the movie is in Dutch, you can read a short summary of the main findings.





Summary
60 students were divided in groups of five. Each group had to create two mood boards resulting in 20 mood boards. These mood boards were uploaded in the D-PAC tool and the students made ten comparisons at home in which they judged the mood boards of their peers and provided feedback.

These comparisons resulted in a ranking of the poorest to the best mood board. So each group had two mood boards in the ranking. The students had to continue with the mood board that was ranked highest. Therefor they could use the feedback to improve their design.

The teacher used the rank order and the feedback from the students to discuss the results in group. He indicated a large time saving because all the students already had seen the mood boards and formed their opinions. Where normally the discussion of the mood boards lasted a whole day, now it lasted one hour using the rank order. According to the professor without sacrificing quality of the discussion, on the contrary.

Further, the professor indicate to save time in processing the results of the peer assessment afterwards as there was no processing because the results were automatically generated by the tool.

Also according to the professor, the learning effect by students of watching other peers’ work and formulating reasons why one was better than the other, was not to be underestimated.

D-PAC successfully handles video-material on large scale

A first pairwise comparison experiment with video material in D-PAC is successfully completed. The goal of this experiment was twofolded: (1) test the tool on the scalability using videos; (2) and test the inter-rater reliability.

A group of 134 students in Education Sciences had to judge 9 clips on the quality of the simulated scientific semi-structured interview demonstrated. The pairwise comparisons were all scheduled synchronously. So, in total 134 assessors were simultaneously interacting with the D-PAC system which was sending out video clips to these assessors. During the experiment no technological issues arose, leading to a very positive conclusion on the scalability of the D-PAC tool.

In order to test the inter-rater reliability the group of assessors was split in three random groups consisting out of 46, 44 and 44 assessors. All of the groups assessed the video’s in a comparative manner. The only difference between the groups was in terms of providing feedback when every comparison was completed. Group 1 was not specifically instructed to give any argumentation or feedback during the process. The second group was asked to give a short overall argumentation for their choice after each comparison. Group 3 was asked to write down some positive and negative features of each interview after each comparison. The amount of comparisons each group made was 520, 354 and 351 comparisons, respectively.

Based on the pairwise comparison data we calculated the Scale Separation Reliability for each of the three groups of assessors separately. The results are given in Table 1. From this table it can be seen that the reliabilities are high (.91 – .93).

Table 1. Scale separation reliability and average number of comparisons per video

Scale Separation Reliability Average number comparisons per video
Group 1 .93 104
Group 2 .93 79
Group 3 .91 78

 

To provide an answer on the question of inter-rater reliability we calculated the correlations between the estimated abilities (based on the Bradley-Terry –Luce Model) of each of the three assessments (see Table 2).  The Spearman rank correlations between the two assessments in which assessors had to provide an argumentation (Group 2) and where assessors had to provide feedback (Group 3) is the highest (.87). The Spearman rank correlations between the scores resulting from the assessment without any argumentation (Group 1) and the two other conditions are somewhat smaller (.82 and .84). Overall these correlations are high.

 

Table 2. Spearman Rank Correlations between scores coming from the 3 groups of assessors

Group 1 Group 2 Group 3
Group 1 1
Group 2 .82 1
Group 3 .84 .87 1

 

Given that each of the 36 possible pairs were assessed by multiple assessors within and between the three groups, we were able to calculate the agreement between assessors for each possible pair. In Figure 1 the agreement is plotted per pair, split up for the three groups of assessors. As shown, the average agreement in each group overall is around 77%. For some pairs the agreement is only 50%, for other pairs the agreement is 100%. These differences can, of course be partially attributed to the fact that in some of the pairs are more difficult to judge than some other pairs. Comparing the results of the three groups showed no significant differences.

fig 1 blog 3

To conclude, this pairwise comparison experiment first of all demonstrates the robustness of the tool to deal with large numbers of assessors assessing video clips simultaneously. From the resulting scales and pairwise comparison data learned us that the inter-rater reliability seems to be rather high as well.