HelpForum

Forum (Archive)

This forum is kept largely for historic reasons and for our latest changes announcements. (It was focused around the older EPPI Reviewer version 4.)

There are many informative posts and answers to common questions, but you may find our videos and other resources more informative if you are an EPPI Reviewer WEB user.

Click here to search the forum. If you do have questions or require support, please email eppisupport@ucl.ac.uk.

<< Back to main Help page

HomeHomeUsing EPPI-Revi...Using EPPI-Revi...Questions about...Questions about...Managing Duplicates - Advanced Mark AutomaticallyManaging Duplicates - Advanced Mark Automatically
Previous
 
Next
New Post
13/01/2014 16:23
 

Hi there,

We have a large number of duplicates to get rid of in our review, and we would like to know the minimum similarity threshold you would recommend to automatically mark duplicates. So far we have reduced to 0.95 similarity but still have around 5000 to go through!

Thanks,

Jennifer

 
New Post
14/01/2014 10:02
 

Hello Jennifer,

The lowest that you would want to set the threshold is dependant on the similarity scores that you see in your duplicate groups. The similarity scores are often dependant on the completeness of your imported items.  If you have missing data in your records then your similarity scores will be quite low. If you find that some of your author names use complete full names while others use initials there will be lower similarity scores.

What I normally do is run it at the default initially (1.0) to complete as many groups as possible. I then look through the incomplete groups to get a feel of the range of similarity scores. Running it again at 0.95 (as you did) is what I would have done. I would then look at a number of incomplete groups to see the similarity scores and make an estimate of how I could go.  If I could find any items that had similarity scores above 0.9 that weren't duplicates then that would give me a good indication. If everything above 0.9 was a duplicate I would run it again at that level to catch as many more groups as possible.

I would continue to lower the score as long as I couldn't find a group that wasn't a duplicate. The lowest I have probably gone is about 0.85 but that was based on the similarity scores I was seeing in my data.

Best regards,

Jeff

 
New Post
14/01/2014 10:26
 

Thank you Jeff, this is very helpful.

Best,

Jennifer

 
Previous
 
Next
HomeHomeUsing EPPI-Revi...Using EPPI-Revi...Questions about...Questions about...Managing Duplicates - Advanced Mark AutomaticallyManaging Duplicates - Advanced Mark Automatically


Copyright 2021 by EPPI-Centre :: Privacy Statement :: Terms Of Use :: Site Map :: Login
Home::Help::EPPI-Mapper::RIS Export::About::Account Manager