Categorizing Imported files? - Importing files

Forum

Forum (Archive)

This forum is kept largely for historic reasons and for our latest changes announcements. (It was focused around the older EPPI Reviewer version 4.)

There are many informative posts and answers to common questions, but you may find our videos and other resources more informative if you are an EPPI Reviewer WEB user.

Click here to search the forum. If you do have questions or require support, please email eppisupport@ucl.ac.uk.

<< Back to main Help page

1
2
3
4
5

Home

Importing searc...

Importing searc...

Importing files

Importing files

Categorizing Imported files?

Categorizing Imported files?

New Post

02/08/2012 01:18

User is offline

No Ranking

Joined: 02/08/2012

Posts: 6

Categorizing Imported files?

Hello,

I would like some help with importing references into EPPI. In specific I am looking if there is a way we can categorize and label my imports.

In our project we want to look at randomized controlled trials (RCT) done in different specialities. As of right now, we have different Refworks account (6 in total) that have all the specialty-specific RCT done in the year 2011. Each individual refwork account deals with a different specialty. So the six different accounts are on:

1. Ophthalmology RCTs done in year 2011

2. Psychiatry RCTs done in year 2011

3. Internal Medicine RCTs done in year 2011

4. Surgery RCTs done in year 2011

5. Obs/Gyn RCTs done in year 2011

6. ENT RCTs done in year 2011

We would like to import all the references from the 6 different Refworks account into 1 single EPPI account. However we would like to be able know which imported article is belonging to which specialty category, ie: be able to identify all the articles on Ophthalmology from Psychiatry etc.

We need these import categories because later on we want to randomly select 50 articles from each category. As it currently stands, once you import all the references into EPPI, they just become a large list of imported articles. We want to be able to differentiate which article belongs to which specialty.

I hope I was clear in describing our problem.

Thanks so much,

Abdul

New Post

02/08/2012 08:52

User is offline

6th Level Poster

Joined: 17/10/2011

Posts: 594

Re: Categorizing Imported files?

Hello Abdul,

Since the different categories are in different RefWork accounts you could import each category separately into EPPI-Reviewer so you would have 6 different sources when you were finished. EPPI-Reviewer will keep those sources separate. In the sources tab you will see each source that was imported and you can list the items by source. You could then assign codes to the items in each source to easily identify them.

You will want to export your items out RefWorks making sure your exported file was in RIS format. You will then have 6 different files to import. You may want to rename these files so they represent the different categories (making them easier to identify in the sources tab).

You can select items randomly using the ‘Assign documents to specified codes randomly’ function that you can find on the Documents tab. You can restrict the items you are selecting from based on codes applied to the items. If you have already assigned codes to the items in each category you can then have 50 random items from each category.

Best regards,

Jeff

New Post

02/08/2012 23:13

User is offline

No Ranking

Joined: 02/08/2012

Posts: 6

Re: Categorizing Imported files?

Hello Jeff,

Thanks for the reply. Just following up, earlier I had imported using Refworks (Tagged) Export and not the RIS format like you mentioned. Would that be alright? Or should I go back and import only using RIS format?

Many thanks,

Abdul

New Post

03/08/2012 12:36

User is offline

Sergio Graziosi

8th Level Poster

Joined: 17/10/2011

Posts: 318

Re: Categorizing Imported files?

Hi Abdul,

We normally suggest users to use RIS-formatted files because every decent reference-management software can produce them and we can be pretty sure that they will work well with EPPI-Reviewer. However, there is no particular reason to avoid using the Refworks tagged format instead: if the imported references look complete to you (i.e. they all have the correct values in the different fields - authors, journal, year, and so on), then you will know that all is well and may proceed with your work as expected.

Best wishes,

Sergio

New Post

06/08/2012 21:22

User is offline

No Ranking

Joined: 02/08/2012

Posts: 6

Re: Categorizing Imported files?

Thanks!

I've been able to import all the files. I'm just having a difficulty in "Assigning Items to specific codes".

In the Home panel I select the references I want to assign to a specific code. Then I right-click on the specific code and select "Assign selected items to this code". Technically this should assign all the references I selected to this specific code. I think click "Yes" as the final confirmation.

However, when I select the code again and right click and select "List items with this code", I get 0 results in the home panel.

I am assuming after I have assigned the items to a specific code, I should be able to pull those up by right clicking the code and selecting "List items with this code". Am I wrong?

How would I pull up the items selected to a specific code?

Many Thanks,

New Post

06/08/2012 22:07

User is offline

No Ranking

Joined: 02/08/2012

Posts: 6

Re: Categorizing Imported files?

sorry for the second message, but I've been following the instructions on the youtube video.

When I go and do the coding assignment under the "collaborate" tab and assign myself work, it says "0" under allocated, "0" under started and and "0" under remaining.

I tried to go back and deletee the codes but when I go to deletee it says the code is associated with 65 items. So essentially, the code is assigned to the items but when I go to collaborate, it shows "0" and when I click on "List items with this code" , it does not show anything.

New Post

07/08/2012 11:43

User is offline

Sergio Graziosi

8th Level Poster

Joined: 17/10/2011

Posts: 318

Re: Categorizing Imported files?

Dear Abdul,

what you are describing seems the typical situation where you are assigning items to a code set that is set for "multiple user data entry". This is used to double code/screen and requires two stages to make the coding "complete" (and hence fully functional).
The "double/coding" works in this way: two or three people will look at items and assign codes, at this stage each person (normally) sees only the codes applied by him/herself. Technically, we describe such codes as "uncompleted". When all reviewers have finished coding, comparisons are created, differences evaluated and reconciled, as this progresses, agreed codes are marked as "completed" and will appear to everyone (and it will be possible to use the "list all items with this code" functions).

On the other hand, codes that are used to manage the review (such as codes used to allocate work) do not need to have different versions for different users, so the code-set containing them (a code-set is a separate tree of codes in EPPI-Reviewer) should be set for "Single user data entry", this essentially makes all codes immediately "complete" so that all reviewers see them immediately and will appear in all report (not only the comparison ones).

We know this is somewhat confusing and are actively thinking about how to better define the terminology (and/or make the procedures more intuitive) but it's a difficult thing to devise: we already have plenty of users that are used to the current system and we fear that any change in procedure/terminology could confuse them.

In practice: almost every review will have one or more "management" code-set that is used to assign work allocation, and group items in any other way that is useful to manage the different stages of a review. These sets will work better if they are set for "single user data entry". Other sets, that are used for screening, data extraction and/or other analysis tasks, may or may not be set for "multiple user data entry": the multiple user mode is useful if and only if the set will be used by different reviewer to "blindly" assign codes and then evaluate the coding differences.

I hope this helps, of course I may have not spotted what your issue is: please feel free to come back to us for more clarifications.

Sergio

New Post

09/08/2012 00:45

User is offline

No Ranking

Joined: 02/08/2012

Posts: 6

Re: Categorizing Imported files?

Okay great, thanks for this!

I just had another question: After I had uploaded the articles into EPPI, I used the function to get rid of the duplications. Now I was under the impression that all the duplicates were removed.

When I went to do my coding assignment, I accidentally recognized that one article was similar to another article I saw much earlier. I went back and double checked and as suspected both articles were identical.

My question is: is there a way to further make sure no other duplicates exist? I've removed all the ones that popped up automatically when I ran the duplicate function. Or is there nothing else we can do once the duplicate function has been run?

Many thanks,

Abdul

New Post

09/08/2012 12:31

User is offline

Sergio Graziosi

8th Level Poster

Joined: 17/10/2011

Posts: 318

Re: Categorizing Imported files?

Dear Abdul,

I gave a look into your review: your claim that "both articles were identical" worried me. This should never happen: it would be very wrong if two articles that have identical "fields" (title, authors, journal, and so on) weren't picked up by the automatic duplicate checking. After looking around for a while, I could find two items that are the same (they represent the same original article), but their fields are quite different, and therefore I can see why this happened.

Forgive me, but I'll need to go into the technical details, in order to explain this.

First, the two items I've spotted (there may be others) are 7787378 and 7787450, short titles are "Cooper (2011)" and "James (2011)". If you look at the two of them, you'll notice that both have a very long list of authors, and that item 7787378 has a much longer list than 7787450 (includes organisation information). Since the article title is relatively short, and there is also a small difference in the journal title, the overall differences have been considered too big, and the duplicate detection routine have ignored the similarity. The automatic procedures can't discern obvious (to humans) similarities, they just use fixed algorithm to calculate a similarity score and work on that. This score is used to decide if items are possible candidates or not: below a given similarity threshold, items will not be suggested as duplicates. Our (the developers) job is to establish the threshold, so that it will work well in most situations. This is a difficult task, because a low threshold will create more false positives, and this will in turn generate more work for all reviewers (that will need to sift through more, unnecessary candidates) or to actual mistakes if reviewers will "automatically assign" items as duplicates using a low similarity threshold and never check the results in person.

Our approach therefore was to accept the risk of letting a few actual duplicates "slip past" the automatic procedure instead of risking to generate too much work or even undue exclusions from a review. The flip side is that your only option for the remaining duplicates is "manual detection". In your case, ordering the main list of documents by their title is a fast and effective way to visually spot possible duplicates pretty fast [You may want to increase the page size, this is done through the "select the fields you want to display" button on the main documents page (the third button from the right) - the maximum is 4000 items per page.] .

Once you'll identify some actual duplicates, you can proceed in two ways, the first is fast and dirty (suitable if you don't need an accurate "duplicates" number), the second a little bit slower.

First route: delete/exclude the additional item copies. You could also have a "duplicates" administrative code and assign items to it. At the end of this exercise you could list all the items with that code and delete/exclude them in one go.

Second route: mark items as duplicates. You will group duplicate items together in the proper way.
- Select the two (or more) duplicate items (making sure you are selecting items that should be part of the same group).
- Click "manage duplicates" and go to the Manual/Advanced tab.
- Click "Find\Find groups that contain the selected Documents".
?-? If no group is found, click the Plus button ("Add new Group") and follow the on-screen instructions.
?-? If a group is found, click the "Add Selected Item(s)" button. This tries to add the selected item to the selected group, if an item is already part of it, it will be ignored.
?-? If more than one group is found, let us know. I will explain why this may happen and how to sort this cases out.

For completeness: you could also proceed in a very different way. That is to "fix" the differences manually, and then re-run the "find new duplicates" function. This will work, but it will not be faster and may create other issues so I don't normally recommend this approach.

Sorry for the long message, I hope this makes sense to you.

Sergio

Page 1 of 1

Home

Importing searc...

Importing searc...

Importing files

Importing files

Categorizing Imported files?

Categorizing Imported files?

Copyright 2021 by EPPI-Centre :: Privacy Statement :: Terms Of Use :: Site Map :: Login

Home::Help::EPPI-Mapper::RIS Export::About::Account Manager