HelpForum

Forum (Archive)

This forum is kept largely for historic reasons and for our latest changes announcements. (It was focused around the older EPPI Reviewer version 4.)

There are many informative posts and answers to common questions, but you may find our videos and other resources more informative if you are an EPPI Reviewer WEB user.

Click here to search the forum. If you do have questions or require support, please email eppisupport@ucl.ac.uk.

<< Back to main Help page

HomeHomeUsing EPPI-Revi...Using EPPI-Revi...Questions about...Questions about...time out when refreshing duplicate listtime out when refreshing duplicate list
Previous
 
Next
New Post
10/08/2011 16:18
 

Hi,

 

I have added new sources to my review 'test' and when I click on managing duplicates and the programme starts to refresh duplicate list it keeps on giving me an error message that says Data portal.fetch failed (Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding.

Is there anything I can do to rectify this?

 

Thank you,

Trude

 
New Post
10/08/2011 17:05
Accepted Answer 

Hi Trude,

I'm looking at this right now. Please don't try to open the duplicates window for the time being. It's clearly a performance issue and I want to be sure we don't try to do the same things at the same time.

Thank you and sorry for the inconvenience,

Sergio

 
New Post
11/08/2011 13:53
 

Hi Sergio,

 

I tried again today and it seems to be working ok. Please let me know if I should not use the duplicates window.

 

Thank you for the help.

 

Trude

 
New Post
11/08/2011 14:07
 

Dear Trude,

Yes, it should be all right now.

Sorry for the delay, I was about to finish checking this. Note that you have loads of 'overlapping groups', you may want to reset all your duplicate data (retaining the work already done). Please check the Duplicate Checking in the manual if this is not clear.

For everyone (and Trude as well):
I've done a big job to change one of the things that happens while "getting new duplicates", performance should be much better with the new procedures (in Trude's case, and it is a very rare one, what used to take 18 minutes now takes about 3 seconds), I have not published this because I need to go through many testing procedures and I certaninly don't want to introduce new bugs. We are aiming to publish an update early next week, if all goes well this new optimised procedure should be published at that time.

Trude: in your case, having many already existing groups, and being about to create many more, the system was taking the wrong guess (database procedures try to self-optimise) and doing the job in such a way that it was always timing out (took 18 minutes where in normal condition the timeout happens after 30 seconds). If you'll want to reset your duplicate data and then "get new duplicates" you may prefer to wait for the new procedures to be published, otherwise you may get stuck again.

I'm sorry for the inconvenience.

Sergio

 
New Post
11/08/2011 14:47
 

Dear Sergio,

 

Thank you so much for all your help.

 

I really need to continue my work with this review so if the new procedures are published next week I will have to try resetting before that.

 

In the manual two ways of resetting are described and it seems to me that the first is what you are suggesting- am I right?

 

Thank you again,

Trude

 
New Post
11/08/2011 18:47
 

Hi Sergio,

 

The second time I went to the managing duplicates window I got the same problem. I want to reset the duplicates but can't as it times out again.

 

Could you help me again? I am sorry about this but did it before you sent the mesage.

 

Trude

 
New Post
11/08/2011 19:50
 

Hi Trude,

I'll have a go tomorrow, sorry for the delay.

Best wishes,

Sergio

 
New Post
12/08/2011 11:41
 

Hi,

I've reset the duplicate checking info and triggered "get new duplicates" with the new procedure for you. Will let you know when it's done.

Best wishes,

Sergio

 
New Post
12/08/2011 12:36
 

Hi again,
all done. New "fresh" groups are ready in your review.

A few notes:

a) when using the first reset option, the one that keeps the duplicate state of items already evaluated, something that might be a little surprising happens. Triggering the "get new duplicates" creates a fresh list of groups, with no overlapping groups, some of these new groups will automatically have "manually added items" these are the items that were already marked as duplicates and that would have been part of the group if they were not already marked as duplicates. This is expected and has the only effect that it makes it relatively easy to un-duplicate these items as well.

b) for clarity sake: clicking "Get new duplicates" is not equivalent to "Refresh list" the latter gets the whole list of duplicates groups from the server and is useful whenever one has used the "find" options or wants to be 100% that the "Complete" state in the list of groups matches the information on the server.

Clicking "Get new duplicates" triggers a totally different and time-consuming routine: it asks the server to look at all items in the review, compare all of them against all the others, find similarities and group them accordingly; then it looks at already present groups, when possible merges the last results with existing groups, and adds new groups as necessary. In processing power terms this is a very resource intensive activity, and the cost increases geometrically with the number of items and groups that need to be evaluated.

In Trude's case, having more than 200'000 items in the review and some 90'000 groups already present, our server was under considerable stress, eventually timing out. This highlighted a sub-routine that could be optimised: the new version appears to be working well so it will be part of the next update.

For all users, the take home message is: you need to click on "Get new duplicates" only in the following cases: 1) when you have added new references in the review and you want to eliminate the new duplicates or 2) you have used the "Reset" functions, 3) you have manually deleted some groups and this proved to be a mistake. In all other cases, to click "Get new duplicates" will not have any positive effect.

The general rule is still the same: to get the best out of our duplicate checking routines we suggest to import all items in the review, make sure not to apply any code to them, "get new duplicates" just once, use the "mark automatically" feature and evaluate the result. At this stage one should decide whether to use the "Advanced mark automatically" feature or to manually evaluate the remaining groups. Of course, this ideal workflow is not always possible, and our system does allow for other routes.

All this is complex, I am well aware of it and I'm working on ideas to improve the user interface so to make manual evaluation much faste (to start with)r.

Best wishes,

Sergio

 
New Post
23/03/2012 14:06
 

Dear Sergio,

 

I am again having issues with duplicate checking, at the moment I re-ran the check for new duplicates, after having checked  27404 groups for duplicates- the problem has arisen as I noticed (even by looking at the first page of the document list) that the programme failed to pick up many duplicates. The programme has now been checking for duplicates for 5 hours but has still not finished- do you have any advice in terms of how to make the programme pick up the missed duplicates?

Thank you very much for any help.

 

Kind regards,

Trude Sundberg

 
Previous
 
Next
HomeHomeUsing EPPI-Revi...Using EPPI-Revi...Questions about...Questions about...time out when refreshing duplicate listtime out when refreshing duplicate list


Copyright 2021 by EPPI-Centre :: Privacy Statement :: Terms Of Use :: Site Map :: Login
Home::Help::EPPI-Mapper::RIS Export::About::Account Manager