Duplicates - Questions about using EPPI-Reviewer

Forum (Archive)

This forum is kept largely for historic reasons and for our latest changes announcements. (It was focused around the older EPPI Reviewer version 4.)

There are many informative posts and answers to common questions, but you may find our videos and other resources more informative if you are an EPPI Reviewer WEB user.

Click here to search the forum. If you do have questions or require support, please email eppisupport@ucl.ac.uk.

<< Back to main Help page

29/03/2011 10:33

Daniel Vujcich

Joined: 15/05/2013

Posts: 8

Duplicates

Good morning,

This is perhaps a basic question re: finding duplicates, but one for which I can't seem to piece together a clear answer from the Manual.

I am working on a very large systematic review and have had to import my documents in small batches to get around the problem of having an internet connection that is seemingly too slow for the database.

The corollary of this is that I ran the Get New Duplicates function a few times, and then abandoned it because I thought that the time that I took to screen the abstracts would actually be comparable to the time it takes to manually check for duplicates. I now realise that that may not have been the best choice as, according to RefWorks, there are some 8000 duplicates.

What I have now decided to do is to run a "Mark Automatically" operation to at least identify (100%) matches to make my life a little easier. I just did this but have found that the results are mixed in with the results of past "check for duplicate" exercises that I never had a chance to manually check. Is there a way of just getting the database to produce me a list of duplicate documents marked with a value of 1 that have not been coded?

Best regards,

Daniel

29/03/2011 11:20

Sergio Graziosi

Joined: 17/10/2011

Posts: 318

Re: Duplicates

Hi Daniel,

I must admit that I am not sure I understand what you are after. Are you asking how to get a list of duplicates without automatically marking them? If that's the case, we can produce such a list quite easily in here. EPPI-Reviewer does not have this functionality built in: we never anticipated the need. Otherwise, if you just look for an easy way forward, I think you might take advantage of the "Reset" functionalities ("manual/advanced" tab). I am using the assuption that your problem is that you have multiple groups containing different subsets of the same items.

If you trust the work you've done so far, you can get rid of all old groups without discharging current duplicates, in this way you'll get a fresh start without loosing anything. When you'll click "get new duplicates", after "Reset \&\ Yes, I confirm (1st option)" you'll get a fresh list of new duplicate groups reflecting the putatitve duplicates that have not been marked already. This should allow you to run "Mark Automatically" without having to bother about what was done before. Note that in the "sources" tab (main screen) you'll get the detailed numbers about duplicates.

Alternatively, if you don't trust what has been checked already, you can choose the second "Reset" option, this will wipe all duplicates information, allowing you to re-start from scratch.

The final option is to wait for today's update: this will allow to manually delete any group of your choice. I think of it as a less radical solution than what you can do through "Reset".

I hope this helps!

Sergio

29/03/2011 12:01

Daniel Vujcich

Joined: 15/05/2013

Posts: 8

Re: Duplicates

Perfect! I didn't realise that reset was an option. Thanks for this.

30/03/2011 10:07

Daniel Vujcich

Joined: 15/05/2013

Posts: 8

Re: Duplicates

Morning! Another question on this issue, I'm afraid. Is there an easy way of deleting the uncoded duplicate of anything that has already been coded?

Best,

Daniel

30/03/2011 11:37

Sergio Graziosi

Joined: 17/10/2011

Posts: 318

Re: Duplicates

Hi Daniel,
I think you would like to automatically change the masters so that whatever has not been coded will never be a master item: after this is done, running "mark automatically" will catch more items and spare you some manual work, is that what you have in mind? In any case, at the moment it is not possible, I'm afraid! However, it is something I have in the back of my mind: it's tricky to implement, so I've left it in my wish-list for the time being.

Still, I have another suggestion for you (I gave a look at your review stats, I hope you don't mind): since you have already done many exclusions, it might be unnecessary to de-duplicate all your items. EPPI-Reviewer allows to start the screening with all items marked as Included and then gradually mark them as Excluded, but also permits the other way round (starting with everything Excluded); for this reason, the duplicate search algorithm evaluates both included and excluded items: what is does not evaluate (with one exception*) is deleted items.

What you could do is:
1) Walk through your list of excludes and mark them as deleted.
2) In duplicate checking, choose the first “reset” option.
3) Get new duplicates once more.
At this point you should have much shorter list of duplicate groups to deal with. Also, if you know that all remaining items will have at least one code applied (I’m thinking of an “include” code of some kind, for example), you could use the “Advanced Mark Automatically” and change the “Codes Threshold”.
4) You will now be able to walk through all your deleted items and mark them as Excluded once more (if needed) – it can be done before, after or during your duplicate sifting: it doesn’t matter when. This will work for you because you have only one item that was explicitly marked as deleted (the only “manually entered” item); all other items are either Included, Excluded, Duplicates, or part of a deleted source. When you’ll list the deleted items, you’ll get all items that were previously excluded (plus one), and hence you won’t lose any information – as a double safety: duplicates and items contained in a deleted source do not show up when listing “Deleted Items”, and even if you’ll explicitly try to mark them as included/excluded, they will remain untouched.

Of course, proceeding in this way is possible only if you don’t need to know how many duplicates you had in the excluded items list.

Note for the other readers: what I’m suggesting here applies to Daniel’s review because he didn’t finish his duplicate checking after importing all items; instead, he proceeded directly to screening and coding. The best practice remains different: whenever possible, the quickest way to get duplicates out of the way is:
- Import all items (without doing any coding)
- Run “get new duplicates” (just once) and use a mixture of manual and automatic marking to evaluate all groups. At the end of this phase, all duplicate groups should be marked as completed.
- Proceed with the rest of the review: should you stumble on some more duplicates, you’ll be able to manually mark them at any time.

I hope this helps,
Sergio

* The one exception is: whenever an item is already present in some duplicates-group, it will always be re-evaluated without exceptions. This is useful to minimise the amount of overlapping groups that would be created when running “get new duplicates” incrementally.

04/04/2011 13:47

Daniel Vujcich

Joined: 15/05/2013

Posts: 8

Re: Duplicates

Thanks for this Sergio. Very useful. I think I understand all of the steps with the exception of step 4, particularly where you say: "You will now be able to walk through all your deleted items and mark them as Excluded once more (if needed)". Can you just clarify this for me? Also can you provide more info on how to use the "Advanced Mark Automatically" function? In particular, once you apply the relevant thresholds, will it just shadow the duplicates in the same way as the basic "Mark Automatically" function?

Best,

Daniel

05/04/2011 11:34

Jeff Brunton

Joined: 17/10/2011

Posts: 594

Re: Duplicates

Hello Daniel,

Step 4 is undoing what took place in step 1. In step 1 the excludes were assigned a deleted flag so they would not be looked at during the duplicated checking (and is why you would then have a shorter list of duplicate groups). Once step 2 and 3 is finished you can then go back to the items that you changed to delete and put them back as exclude. In your particular review I believe there was initially only one item with a delete flag assigned to it so if you list all of the deleted items (by clicking on D in the Documents tab) and select them all (other than the single item that was originally marked as D) you can then mark them with the Exclude (E) flag by clicking on the 'Assign documents to be included or excluded' icon.

The 'Advanced mark automatically' function allows you to adjust the threshold of what the system will automatically consider to be a duplicated. By default two items need to have a similarity score of 1 for the 'mark automatically' function to work on them. By lowering the threshold the 'mark automatically' function will consider lower scores in marking items as duplicates. Lowering the score must be done with care. If the score is too low you can end up with items marked as duplicates that might not be duplicates.

Best regards,

Jeff

05/04/2011 11:44

Daniel Vujcich

Joined: 15/05/2013

Posts: 8

Re: Duplicates

Thanks!

Page 1 of 1