Automated data extraction using GPT-4

We have recently added a connection between EPPI Reviewer and the OpenAI large language model (LLM) GPT-4. There are many possibilities for using LLMs in systematic reviews, and we need to evaluate them carefully before using them widely. For this reason, the use of this feature should be for evaluation purposes only. We do not currently support its use in real reviews.

The feature is only available in reviews where it has been enabled by EPPI Reviewer support staff.

To use the model in such reviews, follow the following steps.

1. Create a code set

Create a code set using the standard tools and add the codes you want to extract information about (e.g. population and intervention characteristics). The text in the 'description' box is the prompt for the language model, and must follow a pre-specified format:

label: data type // prompt

The label should be a short, informative word or two which describes the data to be extracted or identified. The data type can be one of three types: number, boolean or string. If you specify a number, then the model will attempt to extract numeric information into the 'info' box. If you select the boolean datatype, then the model will look to see whether a given characteristic is present and the box will be 'ticked' or not ticked, depending on whether or not it is found. The text after the // is the prompt. This should request the model to find the specified information.

For example:

population_description: string // extract a detailed summary of the population charactistics

This will cause the model to summarise the population characteristics into the 'info' box.

population_number: number // the total number of participants in the study

This will cause the model to insert a number into the info box - the number of people in the study.

physical_activity: boolean // did the intervention contain a physical activity component?

This will cause the model to look for physical activity in the text describing the intervention. If it is found, then the box will be ticked. (And not ticked, if it wasn't found.)

2. Ensure that the correct text is present in the abstract field

Currently, the model works using the text in the abstract field. This can be quite long: up to 3,500 words. You can either simply use abstracts, or copy and paste relevant sections of papers into the abstract field. Be sure that it has saved before running the model.

3. Run the model

Once you have a data extraction / keywording tool prepared using the structure above, and you have the right text in the abstract field, it's time to run the model. Ensure that you have selected the title of your codeset on the left of the screen, and the 'auto-coding with GPT-4' button should appear. Click the button, and wait a few seconds - and the results should appear in your codeset.

4. Evaluate results

As mentioned above, this feature is currently for evaluation only, so please do evaluate your results.