Version 6.17.0.0 is a major release: it includes an entirely new system to generate (prioritised) screening lists, along with a number of other improvements to the Screening facilities. It also includes support for two more LLMs (Llama and Mistral).
New Feature: ordered screening list from ranked searches
In EPPI Reviewer, Priority Screening allows to coordinate the screening work by feeding the "most-likely to be included" reference to the next review member that asks for the "next item to screen". The algorithm to choose what item to show to the current user is driven by the (active) machine-learning "scores" applied to items, and has been refined over years of development, so to maximise its efficiency, according to how many people are participating. Concurrently, it provides auto-reconciliation facilities to speed up multiple independent screening, as well as an optional "auto-exclude" function.
EPPI Reviewer also provides a number of ways to produce ordered lists, where items can be ranked by scores obtained by applying machine-learning classifiers and similar mechanisms.
This version introduces a new system which allows to use these ordered lists (in the form of "Search Results") for screening purposes, utilising the same logic and facilities available through "priority screening". We called this new feature "From Search screening list", shortened to "FS" or "FS list" throughout the user interface.
Features list:
- The "From Search (FS)" and "Priority screening (PS)" lists can coexist and in fact, share some crucial settings (labelled as "review wide"); these are:
- The screening tool - this is mandatory and can be one single tool, which will apply to both lists, if present. Because Priority Screening learns from codes that belong to this tool, when a PS list is configured, changing the screening tool has to be done from the "configure PS" facilities, thus allowing to ensure that the "training codes" can be updated accordingly.
- Reconciliation mode - controlling if/when coding gets auto-completed.
- Auto-exclude setting - allows to assign the "E" for "Excluded" flag to items assigned to exclude codes.
- Two separate "wizards" allow to configure each list - these are the recommended way to configure either feature, especially for new users. The "edit all settings" feature is also provided, and is specific for the PS and FS lists.
- The "Start Screening" buttons for either list are separate and will be visible/enabled whenever the current user has items to screen in the relative list.
- The FS list can be created from any type of search results for which the "Classifier" flag is set to true.
- Combining two "Classifier" search results with the AND operator combines the scores, and retains the "Classifier" flag, therefore, such search results can be used to create FS screening lists too.
- Review Administrators can create a new FS list at any time, by selecting a new record of "search results"
- Deleting a FS list does not delete the historical progress data, but wipes the list, making the relative "Start Screening" button disappear.
- In the "Item Details" page, when it is accessed via either the PS or FS list, the header contains a "PS/FS" label that shows which list is being used.
- Progress re FS lists automatically updates (like PS lists) and can be manually updated at any time. Moreover:
- Progress figures report "review wide" figures (considering all items in the review, even if they are not in the current list) and "within list" (considering only items in the current list). Users can switch between these two kinds of figures at any time.
- Progress will by default show a graph that refers to the current/last list, but progress curves from previous lists remain available.
New Feature: ranked search results from the Open Alex "Auto-update" feed
Many living (or similar) reviews rely on the "Keep up to date" feed of Open Alex new references that might be relevant to the review. This feature is driven by a sophisticated machine-learning model which relies on the thick network of relations between Open Alex references. Until now, users could leverage the "Scores" produced by this model only by setting an "import threshold" at "importing results" time. After that, the scores became inaccessible from within EPPI Reviewer. This limitation is now surpassed by allowing the creation of "ranked search results" from any "Auto-update / keep up to date" set of results.
Details:
- The search results will include any item that belongs to the review and happens to be "matched" to a reference included in the Auto-update results. Even if the item in question was imported via an unrelated source.
- In case two items happen to be matched to the same Open Alex unique identifier, only one such item will be included in the search results. This item will be the oldest item, as it is the item that has the highest probability of being already associated with other review-data.
- If the Auto-update results have been classified with a "study type" classifier, and/or a custom classifier, upon creating search results, it is possible to decide which set of scores to use, and/or to combine 2 or 3 scores. Combined scores are computed as the straightforward average, applying equal weight to each set of scores.
- Since the Auto-update scores are always present, by default the search results will use these scores, unless explicitly configured otherwise.
Combined with the new "From Search" screening list feature, this allows users to screen items imported from the Auto-update feed in the order defined by the Auto-update model, and/or in combination with other scores.
New Feature: hide Priority Screening "progress" datapoints
When Priority Screening is used in any not-conventional and/or not linear manner, or when batch-screening operations are done/undone, the PS progress curve can become overly-complicated, incomprehensible and/or misleading. For this reason, it is not unusual for users to contact EPPISupport asking to delete the progress history or parts thereof. We are always happy to oblige, with the caveat that this operation actually deletes data forever. For this reason, we wrote a new feature, which allows users to hide data-points from the progress graph and table. It is available through the new "Edit" button placed above the (PS) progress table. Users can thus hide/un-hide progress points individually, without having to delete them forever.
Screening features: other improvements
The EPPI Reviewer user interface is designed to discourage creating and keeping contradictory settings; however, due to the high degrees of freedom it offers, it cannot make it "impossible" to configure things in contradictory manners. To mitigate this, the PS "show progress and settings" screen now runs an additional check whenever it is opened (and the current user has enough rights to save configuration changes). This check ensures that "N. of people that need to screen each item" matches the "Data entry mode" of the currently configured screening tool. If there is a mismatch (i.e. the screening tool is in "Comparison" data entry mode, but the screening configuration is set to "1 person", or vice versa), EPPI Reviewer will propose to automatically correct and save the "N. of people" setting.
Priority Screening will automatically trigger new Training rounds when items with specific "trigger" ranks are sent to users for screening. The frequency of such training rounds is a function of how many items have been screened already, and decreases with time. Up until now, training rounds were triggered much too often at the beginning of a screening round and the frequency was then decreasing rapidly. We have reduced the frequency of screening round when less than 1000 items have been screened, and made the "decrease in frequency" less steep. It now happens every 25 items when fewer than 250 items have been screened, with the frequency gradually decreasing to every 400 items, once more than 15000 items have been screened.
New LLM robots: Llama and Mistral
We have added Llama 3.1 and Mistral Large (24.11) to the list of LLMs that can be used by EPPI Reviewer for robot-coding. Llama's cost is £4 per million input tokens and £12 per million output tokens. Mistral costs £2 per million input tokens and £5 per million output tokens. Both models are (as usual) provided to allow evaluations and comparisons to be carried out. More information about all models is here.
"Free text" search: safety improvement
EPPI Reviewer allows users to search for free text against several fields. Searches that target "(Title and) abstract" or the full text get silently interpreted to account for inflections and similar small variations. This means that the original search string gets silently "digested" on submission (and it gets longer in the process). There is a maximum number of characters that applies to the "digested" search string, which means that searches can be truncated, and thus ignore the last terms included in the original search string. This is wrong and can produce misleading search results. We have now implemented two new mechanisms to guard against this problem:
- The search text box allows one to enter up to 2000 characters - roughly half the length of the max length allowed for the resulting "digested" string. We believe it will avoid undue search string truncation in most cases.
- When the search text entered is longer than 1000 characters, a "possible truncation" warning is shown.
Users can of course still produce complex Boolean searches via the "Combine" functionalities.
Configurable reports: better handling of deleted codes
Configurable reports require users to select nodes in the coding tool trees to designate "answer" or "question" codes/nodes. However, coding tools allow users to delete nodes at any time, which can in turn invalidate the selections saved in configurable reports. Until now, the user interface provided no visual cue to detect such cases, and running affected reports could result in making "deleted data" re-surface unexpectedly. When editing configurable reports, the user interface now shows explicitly if a given code included in the report has been deleted. Running reports that include deleted codes will not include/show data that applies to the deleted codes.
BugFix: managing LLM-coding batches
When processing LLM-coding requests (per item or in batches) EPPI Reviewer contacts the API of the chosen model as made available through the Microsoft Azure platform. The EPPI Centre does not control these APIs, and we found that on rare occasions, they might become unresponsive, even for extended periods of time (hours). Until this release, EPPI Reviewer was not handling these situations well.
We have updated how EPPI Reviewer reacts to such failures in the following way:
- Upon dealing with requests that get no reply, it will try again up to 4 additional times.
- Each retry will be configured to give a little bit more time to the LLM to respond.
- If the fourth additional attempt fails, and it is part of a batch, the whole batch will fail and report both the reason and the ItemId of the item that failed to be processed
- If any one of the retries succeeds, the batch proceeds as normal.
This ensures that end users have full visibility of what happened and can recover from prolonged API failures effectively (via the "Past jobs" log). It also makes sure that no single failing API can slow down the jobs queue for very long periods.
Account Manager, small improvements
When assigning credit and selecting a "shared review", the table listing review members had a column for their expiry date, which was left blank. The correct dates now show in that column, making it clear which accounts are expired/about to expire and may thus need a top-up.
The "outstanding fee" feature had the built-in limitation that it would accept only multiples of £5 as the billed amount. This limitation is now removed, and the Account Manager now allows any integer value as the "amount due".