HelpForum

Forum (Archive)

This forum is kept largely for historic reasons and for our latest changes announcements. (It was focused around the older EPPI Reviewer version 4.)

There are many informative posts and answers to common questions, but you may find our videos and other resources more informative if you are an EPPI Reviewer WEB user.

Click here to search the forum. If you do have questions or require support, please email eppisupport@ucl.ac.uk.

<< Back to main Help page

HomeHomeEPPI-Reviewer 4...EPPI-Reviewer 4...Forum announcem...Forum announcem...Latest Changes (30/06/2020 - V 4.11.4.0)Latest Changes (30/06/2020 - V 4.11.4.0)
Previous
 
Next
New Post
30/06/2020 16:37
 

Version 4.11.4.0 marks the arrival of the long awaited-for new deduplication algorithm. This is a new, custom-made algorithm, used to identify duplicates and put them into groups of (possible) duplicates - we believe that this new algorithm is immune to all the major drawbacks of the previous one, and is also significantly more accurate overall. Version 4 also received a few new features: the RobotReviewer tool is now integrated, allowing users to automate the extraction of Risk Of Bias, PICO, (and other) characteristics from the full text documents reporting randomised trials in health; and alongside this, a similar ‘robot’ is also available which is being developed as part of the  Human Behaviour Change Project (more information is here). Meanwhile, the MAG features (on invitation) are still growing and maturing quickly. This release also includes a few bug fixes and general enhancements for version 4.

New Deduplication Algorithm (both versions)

This "new feature" is invisible in itself, but nevertheless quite important in terms of precision, usability and practicality. Until now, EPPI-Reviewer used a pre-packaged "fuzzy grouping" algorithm that created groups of possible duplicates according to its own immutable "logic". The effect of this was at the root of the main drawbacks of the old implementation: getting new duplicates could (and frequently did) create "overlapping" groups, which then required users to manually intervene on a per-group, time-consuming, basis. Moreover, the similarity scores available were fixed and could not be updated when needed (useful especially when changing the master item). As master items were picked automatically, we could not make the system favour the items that are more likely to be the preferred master from a user perspective. Finally, we had no clear way to tweak the system in order to improve its accuracy and performance.

For all these reasons, we have invested a considerable amount of time in writing our own deduplication algorithm and refining it on the basis of what we've learned along the long history of the previous system. Based on our own internal testing, we believe the new algorithm represents a major improvement. The main features are:

  1. It is 100% backward compatible with the previous system, making the transition to the new system entirely transparent, as well as reversible.
  2. It never creates "overlapping groups", which will reduce manual labour significantly (especially for review updates).
  3. It is able to recalculate similarity scores "on the fly" which now happens whenever a user changes the master. This means that "auto mark" will now work, even if the master has been changed (after this update: this specific feature is not backward compatible).
  4. When a group is created, if any item in the group has been coded already, it will be automatically picked as the master for the new group; otherwise the item that was imported first will be picked.
  5. When two references are not identical, but are similar to the point of being almost guaranteed to be real duplicates, the system will return similarity scores between 0.97 and 0.99. We think that the risk of "losing" false duplicates is negligible for similarity scores above or equal 0.97. [We could not make this same statement for the previous version.]
  6. For similarity scores between 0.80 and 0.97, we expect the new system to pick up false duplicates very rarely. Compared with the old system, these occurrences should be far less frequent. As a consequence, "marking automatically" with a threshold set at 0.80 will be picking up (and thus hiding) significantly fewer false duplicates.
  7. Similarity scores have a new low bound set at 0.70. Between 0.70 and 0.80 we expect to find some false duplicates. The new system does allow to "mark automatically" with such low thresholds, but we deem this to be risky and should only be done when strictly necessary.
  8. Usability and accuracy should not degrade significantly if "get new duplicates" is triggered after importing new references in cycles (import, get new duplicates, import again, etc.). We believe that the new system to pick a user-friendly "master item" (as opposed to the "best" representative of a group in terms of similarity) has the potential to reduce the effectiveness of the new system after many "import and deduplicate" cycles - however, in our own tests, we could not measure any significant degradation - the only shortcoming we could observe was the rare occurrence where 2 groups got created, when it would have been "better" (from a human perspective) to create one group only.
  9. The system has an improved "self-control" set of features, meaning that it should be less prone to failures and should, in most circumstances, be able to "self-recover". The new algorithm is also "aware" of the work already done, meaning that it does not need to re-evaluate items that have been deduplicated already. As a result, performance in terms of "time required to get new duplicates" should improve as well.

The effects of all these changes are significant for most users. First and foremost, the problems created by overlapping groups meant that it was important to import as many search results in a row as possible, before triggering "get new duplicates". With the introduction of the new system, we expect that, under normal circumstances, triggering "get new duplicates" more frequently will not generate any problem. As a consequence, performing review updates should become much faster and "iterative" methodologies are now supported much better than before.
Moreover, the new system is entirely under our control, including how references are grouped, how the master item is chosen and how the similarity scores are calculated. Thus, we will be able to tweak the new system at a much faster pace, if/as needed.

A known shortcoming of the new system is its poor performance when evaluating references written (and imported) in Chinese (presumably, this applies also to similar writing systems), we are concerned about this problem and will endeavour to develop a workaround as soon as possible.

Version 4, new features

In EPPI-Reviewer version 4, whenever a (full-text) document is uploaded, it is now possible to utilise the extracted text to "automatically" perform two different kinds of data extraction. This requires importing a "pre-designed" coding tool which is designed to allow the saving of automated classifications results in an appropriate manner. Users can find these two "robot-templates" coding tools to the list of "public" coding tools available for importing into any review.

The two machine learning systems we use behind the scenes are:

  • RobotReviewer, which can locate and extract PICO characteristics and Risk of Bias key "phrases" for Randomised Controlled Trials.
  • Human Behaviour Change robot, which is particularly focused on identifying the specific approaches to behaviour change that are used in interventions evaluated in randomised trials.

More information about these tools can be found here.

Listing Machine-Learning results by score values. When applying a machine learning model, the results are made available in the search tab, which makes the corresponding list of items available in an "ordered by score" manner. It is also possible to click on "visualise" to see the distribution of scores in the search result; from this same window, it is now possible to create a "derived" search by specifying custom score thresholds so to quickly identify items based on their score range. Previously, this was possible only by browsing through the pages of search results.

In the machine learning window, it is now possible to "rebuild" a custom-made model in order to update it, accounting for additional/changed training data. Previously, the only way to achieve this was to create a new model from scratch.

We expect to make these new features available in EPPI-Reviewer Web at the next update.

Version 4, bug fixes

In the item details window, it is possible to refresh the list of coding tools and get their latest version from the database, at any time. However, when doing this, the coding data for the current item would disappear and people had to browse back and forth in order to make the system reload it. This problem is now solved.

Deleting uploaded (full-text) documents. In the item details window, deleting a full text document after accessing the "Text Document" tab would generate an error, which would then subsequently make other functionalities on the same page malfunction - the deletion did happen, but would leave EPPI-Reviewer in an unstable condition. This problem is now solved.

Future plans

We will naturally keep an eye on how the new deduplication system is performing and will be ready to fix any issues that might be discovered via widespread usage. Deduplication is a surprisingly complicated set of functionalities, which means it is possible that some teething problems managed to pass undetected through our testing rounds. Otherwise, we expect to implement the "set up priority screening" features into EPPI-Reviewer Web. At the time of writing, we are still not sure of when it will be possible to open up the MAG features to all EPPI-Reviewer users.

 
Previous
 
Next
HomeHomeEPPI-Reviewer 4...EPPI-Reviewer 4...Forum announcem...Forum announcem...Latest Changes (30/06/2020 - V 4.11.4.0)Latest Changes (30/06/2020 - V 4.11.4.0)


Copyright 2021 by EPPI-Centre :: Privacy Statement :: Terms Of Use :: Site Map :: Login
Home::Help::EPPI-Mapper::RIS Export::About::Account Manager