Automated coding (using Large Language Models)

Automated screening and data extraction using Large Language Models

EPPI Reviewer allows to interrogate Large Language Models directly from within the user interface.
There are many possibilities for using LLMs in systematic reviews, and we need to evaluate them carefully before using them widely. For this reason, the use of this feature should be for evaluation purposes only. We do not currently recommend its use in real reviews, not without first testing the accuracy of results in a formal manner.

The feature is only available in reviews where it has been enabled by purchasing Credit and assigning said credit for use on LLMs in a given review. Credit can be purchased from the Account Manager, and then assigned to reviews (by clicking "Edit" for a given review, in the "Summary\Reviews" tab).

EPPI Reviewer implements a number of different LLMs, which is essential to conduct studies about how well they perform. For this reason, we expect the list of available LLMs to change during time (LLMs that cannot be relied upon may disappear, if any, more/newer LLMs will also appear). If you wish to conduct formal evaluations, please contact EPPI Support, as we are conducting pre-registered studies for this purpose, and you might be interested to participate/contribute to this research work.

Once LLMs are enabled in a given review, please follow the following steps to use them:

1. Create a code set

Create a code set using the standard tools and add the codes you want to extract information about (e.g. population and intervention characteristics). The text in the 'description' box is the prompt for the language model, and must follow a pre-specified format:

label: data type // prompt

The label should be a short, informative word or two which describes the data to be extracted or identified. The data type can be one of three types: number, boolean or string. If you specify a number, then the model will attempt to extract numeric information into the 'info' box. If you select the boolean datatype, then the model will look to see whether a given characteristic is present and the box will be 'ticked' or not ticked, depending on whether or not it is found. The text after the // is the prompt. This should request the model to find the specified information.

For example:

population_description: string // extract a detailed summary of the population characteristics

This will cause the model to summarise the population characteristics into the 'info' box.

population_number: number // the total number of participants in the study

This will cause the model to insert a number into the info box - the number of people in the study.

physical_activity: boolean // did the intervention contain a physical activity component?

This will cause the model to look for physical activity in the text describing the intervention. If it is found, then the box will be ticked. (And not ticked, if it wasn't found.)

2. Ensure that the correct text is present in the abstract field

Currently, the model works using the text in the abstract field. This can be quite long: up to 3,500 words. You can either simply use abstracts, or copy and paste relevant sections of papers into the abstract field. Be sure that it has saved before running the model.

3. Run the model

Once you have a data extraction / keywording tool prepared using the structure above, and you have the right text in the abstract field, it's time to run the model. Ensure that you have selected the title of your codeset on the left of the screen, and the 'auto-coding with GPT-4' button should appear. Click the button, and wait a few seconds - and the results should appear in your codeset.

4. Evaluate results

As mentioned above, this feature is currently for evaluation only, so please do evaluate your results.

How it works:

Behind the scenes, EPPI Reviewer collects the prompts present in the selected coding tool and uses them to produce a single request to the GPT4 API, including the prompts, instructions, and the Title and Abstract of the Item in question. Upon receiving the response, it then applies the relevant coding, including the "additional text" returned by the API, as if a person was ticking the code checkbox and typing the additional text in the "info box".

As of version 6.15.3.0, the initial implementation has been extended in the following ways:

Coding added by GPT4 is now added in the name of the robot itself ("OpenAI GPT4"). This ensures it is always possible to discriminate between coding produced by the robot and coding produced by humans.
Adding the coding in the robot's name also allows you to set the relevant coding tool in "comparison mode", making the machine add its coding as "incomplete", and thus facilitating the comparison against "gold standard" coding (produced by actual people).
By default, if the robot is asked to code an item when coding is already present (and completed) in the name of someone else, the robot will produce the coding as "incomplete", which hides it "behind" the complete version. This is the default behaviour as it guarantees that it will be always possible to discriminate between coding produced by the robot and coding produced by humans.
1. The option to change this behaviour is automatically presented to users when appropriate. Un-ticking the "Always add coding in the Robot's name" option will make the resulting codes appear as part of the already present (and completed) coding, thus "filling in" the gaps (if any) but without overwriting any coding already present.
For the same reason, by default, the robot adds its coding as "locked". This behaviour is useful to prevent accidental editing of the coding, which would blur the lines between "robot" and "human" coding.
1. There is an option to change this behaviour too.
The underlying GPT4 version was upgraded to use the latest version available (GPT4o).

Please note that the GPT4 API is not free. At present (June 20, 2024) we (as the EPPI Centre) are bearing the cost, because we strongly believe that it is important to support the production of good evaluations of the use of LLMs in evidence synthesis.

API Version History

The GPT models are explicitly designed to be continuously updated, however, the GPT API allows the client (EPPI Reviewer) to ask for a specific Model and version combination. EPPI Reviewer does in fact specify both model and version explicitly, as it is of paramount importance to keep track of them for the purpose of evaluations. Please refer to the version history below when compiling evaluation reports.

12 December 2023. Main Model: GPT4 8K. Version: 2023-07-01-preview
18 June 2024. Main Model GPT4o. Version: 2024-02-01