Hi again,
all done. New "fresh" groups are ready in your review.
A few notes:
a) when using the first reset option, the one that keeps the duplicate state of items already evaluated, something that might be a little surprising happens. Triggering the "get new duplicates" creates a fresh list of groups, with no overlapping groups, some of these new groups will automatically have "manually added items" these are the items that were already marked as duplicates and that would have been part of the group if they were not already marked as duplicates. This is expected and has the only effect that it makes it relatively easy to un-duplicate these items as well.
b) for clarity sake: clicking "Get new duplicates" is not equivalent to "Refresh list" the latter gets the whole list of duplicates groups from the server and is useful whenever one has used the "find" options or wants to be 100% that the "Complete" state in the list of groups matches the information on the server.
Clicking "Get new duplicates" triggers a totally different and time-consuming routine: it asks the server to look at all items in the review, compare all of them against all the others, find similarities and group them accordingly; then it looks at already present groups, when possible merges the last results with existing groups, and adds new groups as necessary. In processing power terms this is a very resource intensive activity, and the cost increases geometrically with the number of items and groups that need to be evaluated.
In Trude's case, having more than 200'000 items in the review and some 90'000 groups already present, our server was under considerable stress, eventually timing out. This highlighted a sub-routine that could be optimised: the new version appears to be working well so it will be part of the next update.
For all users, the take home message is: you need to click on "Get new duplicates" only in the following cases: 1) when you have added new references in the review and you want to eliminate the new duplicates or 2) you have used the "Reset" functions, 3) you have manually deleted some groups and this proved to be a mistake. In all other cases, to click "Get new duplicates" will not have any positive effect.
The general rule is still the same: to get the best out of our duplicate checking routines we suggest to import all items in the review, make sure not to apply any code to them, "get new duplicates" just once, use the "mark automatically" feature and evaluate the result. At this stage one should decide whether to use the "Advanced mark automatically" feature or to manually evaluate the remaining groups. Of course, this ideal workflow is not always possible, and our system does allow for other routes.
All this is complex, I am well aware of it and I'm working on ideas to improve the user interface so to make manual evaluation much faste (to start with)r.
Best wishes,
Sergio