Automation for discovery of inequalities research

Research

Technology research & development

Links

Back to 'Research Projects'

Automation for discovery of inequalities research

Finding Accessible Inequalities Research in Public Health (the FAIR Database)

This work has been developing methods to apply machine learning and Natural Language Processing approaches to support the review, assessment, evaluation and summarisation of large volumes of public health research to support decision making. We have developed and applied automatic methods for identifying information about inequalities, study types and common themes mentioned within large volumes of public health research. The output of these techniques are available through an online tool (below) containing a continuously updated repository of public health research. Users are also be able to upload their own data for processing and download results through EPPI-Reviewer.

The key research question we addressed is:

Can text mining be used to maintain a ‘living’ database of public health research, including information about topics, methods and inequalities?

We addressed this research question by developing a ‘living’ database of public health research in collaboration with public health decision-makers, researchers, and patients and the public. The database is populated by identifying public health records from the >210 million records in OpenAlex and will be a ‘living’ database, as it will be updated regularly with newly published research.

While the initial NIHR project is officially complete now, we are continuing to develop the technology, and will update the tool periodically.

Version 1 (beta) of the database is available here.

A paper published in JAMIA Open describes the development of the database in detail: Zhixue Zhao, James Thomas, Gregory Kell, Claire Stansfield, Mark Clowes, Sergio Graziosi, Jeff Brunton, Iain James Marshall, Mark Stevenson, The FAIR database: facilitating access to public health research literature, JAMIA Open, Volume 7, Issue 4, December 2024, ooae139, https://doi.org/10.1093/jamiaopen/ooae139

Links to source code and data:

The PROGRESS-Plus dataset, used to train the classifiers can be downloaded here. The NIHR has asked we release this dataset under the CC-BY-NC-SA license. Please get in touch if you would like to make commercial use of the dataset.
The source code for the PROGRESS-Plus classifier and other workflows is available here.

Project staff include:

James Thomas and Claire Stansfield (UCL)
Iain Marshall and Gregory Kell (Kings College London)
Mark Stevenson, Cass Zhixue Zhao and Mark Clowes (University of Sheffield)

This study/project is funded by the National Institute for Health Research (NIHR) Public Health Research Programme (NIHR133603). The views expressed are those of the author(s) and not necessarily those of the NIHR or the Department of Health and Social Care.

Home::About::Research::Training::Resources::Databases::Blog::Publications