Inter-rater reliability - Questions about using EPPI-Reviewer

Forum (Archive)

This forum is kept largely for historic reasons and for our latest changes announcements. (It was focused around the older EPPI Reviewer version 4.)

There are many informative posts and answers to common questions, but you may find our videos and other resources more informative if you are an EPPI Reviewer WEB user.

Click here to search the forum. If you do have questions or require support, please email eppisupport@ucl.ac.uk.

<< Back to main Help page

Home

Using EPPI-Revi...

Questions about...

Inter-rater reliability

20/09/2017 19:51

Madeleine Greig

Joined: 12/09/2017

Posts: 3

Inter-rater reliability

Hello!

I am wondering if there is a 'gold standard' inter-rater reliability percentage that reveiwers should be aiming for? I have heard 95% being tossed around in methods sections, but I can't seem to find this statistic referenced anywhere.

Furthermore, is inter-rater reliability calculated based upon the discrepancies between what is simply 'included' or 'excluded', or does it take into account the discrepancies between "reasons for exclusion". I hope that makes sense!

Your support in this is greatly appreciated.

Warmly,

Madeleine

25/09/2017 15:49

Jeff Brunton

Joined: 17/10/2011

Posts: 594

Re: Inter-rater reliability Modified By Jeff Brunton on 25/09/2017 14:50:27

Hello Madeleine,

There really isn't a universal standard that must be reached as there are many possible reasons comparison coding might be carried out and those reasons might have different requirements. As well, there are many different ways comparison coding could be carried out. Some review organisation might have their own standard on the level of agreement required.

The answer to your question would be what level of agreement (or disagreement) are you comfortable with based on the parameters of the inclusion/exclusion criteria. If you are double screening a random sample to check that all coders are interpreting the screening tool the same way then you might require 100% agreement before moving forward. If the screening criteria required quite a bit of interpretation you might be happy with less agreement.

As well, you might have lots of disagreement on the reason for exclusion but only concern yourself with disagreements on inclusion vs exclusion. In that case you might want 100% agreement but only on the inclusion vs exclusion comparison.

I think the important issue is that you understand the reason for any disagreements as they might indicate confusion in how the criteria works with the studies in your review.

The kappa statistic itself is often a source of misunderstanding. A good paper to read is 'Fleiss J, Cohen J, Everitt, B (1969) “Large Sample Standard Errors of Kappa and Weighted Kappa”, Psychological Bulletin, 72(5) pg. 323-327.' There is the statistics kappa and the weighted kappa and their use depends on the relative seriousness of the possible disagreements (ex. include vs exclude and exclude vs exclude)

If you are calculating a kappa statistic I have found this paper useful in interpreting the value. 'Viera A, Garret J (2005) “Understanding the Interobserver Agreement: The Kappa Statistic” 37(5) pg. 360-363'

Best regards,

Jeff

Page 1 of 1

Home

Using EPPI-Revi...

Questions about...

Inter-rater reliability