Dear Trude,
I've spent some time investigating your review. I could spot several issues, some are the direct consequence of the large number of references, but some may be the cause of the difficult to handle size of your review.
Duplicate checking: the routines returned approximately 300000 duplicate groups, unfortunately, unless your PC is *very powerful*, the user interface will be unable to handle such a number, and eventually return an "Out of Memory" error. This is particularly unfortunate, because it means you will need us to write some special solution for your own case, and I am guessing you would like to avoid the extra cost that comes with this.
Despite this, I gave a quick look at how your duplicate groups look like: I think they are pretty comprehensive, and should not be missing much. Some manual adjustment may be needed, but that's something we may deal with later on.
Possible import problems: looking at groups of duplicates, I noticed that frequently most group members come from the same "source". This is unusual, because typically a single source represents a single search (or part of), so I took the initiative an gave a look at your sources as well. The first batches of imports looks perfectly all right to me, but the last ones (Feb 2012) seem to be affected by some problems:
Many of the single sources appear to contain duplicates, but the real problems appear in the "Wok" sources. These all share the same two issues:
1) they all have a large number of duplicates, across the same source and across different sources, this means that many duplicate groups will be large, containing multiple copies of what seem to be exactly the same references.
2) I did search for a while, but couldn't find any "wok" reference that has an abstract (i.e. all abstract fields are empty).
This situation makes me think that there is some kind of problem with your general search/import strategy (why do many searches include the same item multiple times?) and/or the import routines.
All in all, the anomalies explained above may mean that we don't have to deal with such great numbers, and that we should fix whatever went wrong at the importing stage instead of trying to cope with all these duplicates. This hope is reinforced by the fact that you have now 1581925 "group members", meaning that we may be dealing with something like 4-600000 genuine items, and that some 1.2 million items are actually duplicates.
What do you think? As this is a rather peculiar situation, please feel free to contact us directly through eppisupport@ioe.ac.uk: I doubt that this discussion will be particularly interesting for our typical user.
Best wishes,
Sergio