Using D-PAC for CV-screening

Comparative judgement is nowadays predominantly used in the educational domain. The D-PAC team aims to explore CJ’s strengths beyond this realm, for example in the recruitment and selection domain. Therefore, we conducted a try-out investigating whether or not D-PAC was successful when applied to CV-screening. Consequently we partnered with Hudson (http://be.hudson.com – a human resources consultancy company) using a received job opening from a client. Forty-two CV’s were received and D-PAC was used with 7 assessors to compare the CV’s. Assessors also provided pairwise feedback to justify each choice. The main questions were related to reliability and validity: (1) how reliable is the D-PAC assessment on CV screening with expert assessors (if the assessment would be performed again, how strongly will the ranking resemble the current one)? And (2) do all assessors look at the same and relevant criteria of the CV’s in relation to the job ad (validity)?

Results show that the assessment reached a high reliability (SSR = .88 – see figure 1). In addition to this, this high reliability was already achieved at 14 rounds. Moreover, inspecting the cut-off of acceptable reliability (SSR =.70), this was already accomplished after 9 rounds. The time investment of the total assessment was 11.5 hours, including pairwise feedback. However, since high reliability was already attained early on (9 rounds), this timing can be drastically reduced to 5 hours. Moreover, this time investment is still an overestimation, since in reality assessors do not provide feedback on the CV’s. To give an indication: it takes about 73 seconds to read two CV’s and decide which one is more in line with the job. If assessors have to give feedback to justify their choice, time increases to 90 seconds for each pair. To summarize, attaining a reliability of .70 without providing any feedback results in a time investment of 5 and a half minutes for each CV.

SSR round Hudson
Figure 1: Reliability (=SSR) of the CV-screening assessment. In total, 23 rounds were performed. Blue lines indicate different reliability levels. Reliability of .80 achieved at 14 rounds. Reliability of .70 achieved at 9 rounds

Additionally, assessors’ arguments were analysed to inspect the validity of the assessment. The main discussed themes were ‘work- and job-experience’, ‘education’, ‘over qualification’ and ‘job hopping’. Two themes were recurrent in all 7 assessors’ arguments: work- and job-experience and education. One theme was only discussed by one assessor: ‘age’. The top arguments per assessor are represented in figure 2. Most striking is that relevant experience and the amount of experience were most frequently mentioned by every assessor. Additionally, job hopping was mentioned a lot by assessor 2.

argumenten hudson
Figure 2: Top arguments given by all 7 assessors.

Next, we investigated which CV’s were in the lowest or highest position in the ranking and what type of comments they mainly received. Here, we found that when assessors mentioned something about candidates’ experience (or the lack of it), this CV had a higher chance to be lower ranked. On the other hand, when assessors discussed about candidates’ education, general experience, over qualification, bilingualism, job-hopping and the given explanation of experience, CV’s were more likely to end up at the higher part of the ranking (see table 1).

Arguments Low ranking High ranking
Amount of experience 40 26
Education 18 35
General experience 1 22
Overqualified 0 6
Bilingualism 2 8
Job-hopping 2 9
Explanation experience 0 6

Table 1: Arguments which differ between CV’s at the lower part of the ranking and the higher part of the ranking

To summarize, this try-out shows many opportunities. Firstly, it indicates that D-PAC is usable in a recruitment and selection domain, showing high reliabilities in a short amount of time. In addition to this, time investment will be reduced in future similar assessments, increasing its efficiency. Secondly, regarding the validity, the analyses of the provided arguments indicates that recruiters share the focus on relevant experience for this job. Next to this, recruiters differ in emphasis, each recruiter imposes different emphases during the assessment, which is captured when using multiple assessors. This further underpins the logic of including multiple assessors during a cv screening process.

Peer assessment in D-PAC reduces workload for tutors!

A group of 91 students second bachelor of the University of Hasselt in the track physiotherapy had the following task at the end of this year:

    – They had to formulate a clinical research question based on their experience as a physiotherapist;
    – Then they searched for a relevant scientific paper and formulated an answer to the research question based on the article.
    – At last they had to evaluate the article and point out the strengths and weaknesses of their study.

Normally all these papers are evaluated by one or two tutors. These tutors judge the paper by giving a ‘passed’ or ‘failed’ and provided feedback. You can imagine, this results in a substantial workload, especially when more than one task per student needs to be marked.

The tutor was inspired by a presentation about the D-PAC project. At first, the tutor was a bit skeptic. However, the possibilities of the tool were tempting enough to conduct an experiment in which peers would have to judge and comment the papers using the D-PAC tool. Next to this, the tutors evaluate the papers on their traditional manner. Afterwards the judgments and feedback of the students could be compared with the judgement and the feedback of the tutors.

Based on the pairwise comparison data we calculated the Scale Separation Reliability (SSR) for the student evaluations. The SSR was .80 and can be seen as a very reliable scale. To achieve this, 91 students had made 910 comparisons in total, in other words, every paper was compared 20 times.

The feedback students provided was of high quality. The results of a survey conducted by the students supported this statement. Students perceived the D-PAC peer feedback as relevant, honest and legitimate. Because almost every assessor gave feedback on almost every paper they had to compare, each student received feedback of 15 à 20 peers. Students indicated this as an added value of the D-PAC method.

If we compare outcomes of the students’ assessment and the tutors’ pass/fail decisions, we see a high resemblance. As Figure 1 shows, 12 students were given a fail by the tutor (red dots) and they all are located on the left side of the rank order. We can conclude that students can, by using pairwise comparison, evaluate their peers papers as good as tutors on their traditional manner.

rank Joke

However, as you can see, some blue dots remain on the left hand side, meaning that they were judged by students to be of poor quality, whereas tutors considerd them passed. Therefore, the coming year, the tutor will check the 40% lowest ranked papers to verify whether they failed. As such, using this combination of peer review and feedback together with a final check by the tutor, the workload of the tutor is reduced by at least 60% while ensuring the quality of the decision and the feedback.

Testimonial professor architecture

The next film is a testimony of an architecture professor who used D-AC for a peer assessment of mood boards. Because the movie is in Dutch, you can read a short summary of the main findings.





Summary
60 students were divided in groups of five. Each group had to create two mood boards resulting in 20 mood boards. These mood boards were uploaded in the D-PAC tool and the students made ten comparisons at home in which they judged the mood boards of their peers and provided feedback.

These comparisons resulted in a ranking of the poorest to the best mood board. So each group had two mood boards in the ranking. The students had to continue with the mood board that was ranked highest. Therefor they could use the feedback to improve their design.

The teacher used the rank order and the feedback from the students to discuss the results in group. He indicated a large time saving because all the students already had seen the mood boards and formed their opinions. Where normally the discussion of the mood boards lasted a whole day, now it lasted one hour using the rank order. According to the professor without sacrificing quality of the discussion, on the contrary.

Further, the professor indicate to save time in processing the results of the peer assessment afterwards as there was no processing because the results were automatically generated by the tool.

Also according to the professor, the learning effect by students of watching other peers’ work and formulating reasons why one was better than the other, was not to be underestimated.

D-Pac field trials

We at D-Pac are very happy to see that more and more organisations are interested in testing the Open source D-Pac platform (https://github.com/d-pac). As you can see, in the table below we have run a variety of assessments in diverse applications in the education and HR sector.

In each of these assessments, the organisation that we work with gets the opportunity to find out what assessments through Comparative Judgement can mean for them in a hands-on way. The deal is that we provide the hosted software and advice on how to set it up and run the system for free, while the organisation that hosts the assessment provides us with the research data we need to contribute to the advancement of the knowledge on Comparative Judgement and the tools that support it.

Competence Domain Assessees Assessors Reliability
1 Argumentative writing Education 135 High-school students 68 Teachers and teachers in training 0.81 average
2 Writing formal letters Education 12 High-school students 11 Teachers in training 0.68
3 Mathematical problem solving Education 58 High-school students 10 mathematics teachers + 4 mathematics teachers in training 0.80 average
4 Capability of visual representation in the arts domain Education 11 High-school students (147 representations) 13 teachers 0.85
5 Interpreting statistical output using peer-evaluation Education 44 Master students 33 Master students 0.80
6 CV-screening Human resources 42 candidates 7 HR professionals 0.88
7 Self-reflections Education 22 Master students 9 teachers 0.75
8 Project proposals 6 projects 5 judges 0.71

Besides the domains of education and human resources, we are also very curious to find out what other applications could exist for Comparative Judgement. For example, would it be useful in A/B testing in the marketing or IT development domain and would it be interesting as an alternative voting mechanism in for example TV shows?

To conclude, there could be a range of applications for the D-PAC tool that we have not yet thought of, so we are looking for ideas on how we could expand the impact of our tool.
Feel free to contact us if you are interested in testing D-PAC in your own context and together we can find out how we can help each other: d-pac@uantwerpen.be

 

The D-PAC team