COVID-19: Living map of the evidence [ARCHIVED]

Introduction
  View more 
  View less 

FINAL ARCHIVED VERSION (Version 106 - 10th October, 2023) - published 24th October, 2023

Overview:
This archived living map (the Map) consists of bibliographic records of research articles on COVID-19 published from December 2019 until October 2023 [a]. Between 2020 and 2023, in response to the COVID-19 pandemic, we continually maintained this living map of international research evidence on COVID-19 for the UK Department of Health and Social Care. This map was transitioned out of living mode in October 2023, and the final, archived version can be accessed here (web database).

The Map was continually maintained using a fully automated workflow, hosted in EPPI Reviewer Web (ER-Web).

Using this map:
There are various ways to view the map content. Please take a brief moment to familiarise yourself with the various options.

Quick Start:
All the bibliographic records in the Map have been tagged (or 'keyworded') with one 'Topic' code, listed in the first expandable list on the left. To list all records of 'Treatment evaluation' studies, open the 'Topic' heading, click on 'Treatment evaluation' and then on the button above 'List records'. Select 'Home' to return to the previous screen. To see how many records are in each category under the 'Topic' code, select 'Topic', and click on the 'Frequencies' button above it. This will open a table in the middle of the screen that tells you how many records have been categorised with each heading. You can then list the records by clicking on the corresponding number. Similarly, all records have one version code, as well as being tagged with 'All versions'. To see how many records are in each version of the Map, select 'Version' or 'Previous versions' and click on the 'Frequencies' button. Clicking on each number will display that set of records.

Downloading data and exploring individual records:
The search features described above (and below) will result in a list of bibliographic records being displayed. This list can be downloaded as a plain text file, in Excel format, or as a RIS file for importing into reference manager software (such as Zotero or EndNote). Clicking an individual title in the list will result in the detailed information about that record being displayed. This defaults to standard bibliographic information plus the abstract, but can be expanded to include all bibliographic fields in the database. This screen also contains a 'show coding' button, which opens up the display to show the 'Topic' and 'Version' codes assigned to that specific record.

Displaying a map or cross-tabulation:
Crosstabulation operates using the grouped headings of codes. To show a table of all records with topic across the top and version down the side, click on the heading 'Topic' and, at the bottom right of the screen, click on the 'Set X axis' button; then click on the 'Version' heading (on the left of the screen; or Previous versions... below Version) and click on the 'Set Y axis' button; finally, click on the 'Get Crosstab' button (bottom right of the screen). The resulting page will display a matrix showing the intersections of the categories under these two headings. By clicking on the 'Bubble map' button, this matrix can be changed from a table to a bubble map, with bubbles indicating the relative number of records in each cell (similar to the display in EPPI-mapper). The numbers / bubbles in the cells are clickable, and clicking on them will display a list of the records in that cell (below the table or bubble map - this may take some time to appear).

Finding a specific record or set of records:
There is a free-text search at the top of the 'Home' screen. This defaults to searching the title and abstract fields, but specific fields can be selected using the drop-down menu next to it.

Identifying the evidence:
Searches 1-34
Prior to implementing the OpenAlex-enabled (formerly MAG-enabled) workflow described below (see Version 35 to 44), evidence in this Map was identified by conducting weekly searches of MEDLINE and Embase, beginning on Wednesday 4th March 2020, and updating the search strategy as necessary.

MEDLINE search strategy as at October 2020:

Database: Ovid MEDLINE(R) ALL <1946 to October 01, 2020>
Search Strategy:
--------------------------------------------------------------------------------
1 ("20200925" or "20200926" or "20200927" or "20200928").dt. (14732)
2 preprint.pt (1048)
3 1 not 2 (14732)
4 limit 3 to covid-19 (1119)

The Embase search strategy as at October 2020:

Database: Embase <2016 to 2020 Week 40>
Search Strategy:
--------------------------------------------------------------------------------
1 "202040".em. (120134)
2 limit 1 to covid-19 (5705)

For further details of these MEDLINE/Embase search strategies, please see current OVID Covid-19 Expert Searches developed by Wolters Kluwer, available from: https://tools.ovid.com/ovidtools/expertsearches.html#corona - for example (October 2020):

Coronavirus (Covid-19) 2019-nCoV on MEDLINE
1. exp Coronavirus/
2. exp Coronavirus Infections/
3. (coronavirus* or corona virus* or OC43 or NL63 or 229E or HKU1 or HCoV* or ncov* or covid* or sars-cov* or sarscov* or Sars-coronavirus* or Severe Acute Respiratory Syndrome Coronavirus*).mp.
4. (or/1-3) and ((20191* or 202*).dp. or 20190101:20301231.(ep).) [this set is the sensitive/broad part of the search]
5. 4 not (SARS or SARS-CoV or MERS or MERS-CoV or Middle East respiratory syndrome or camel* or dromedar* or equine or coronary or coronal or covidence* or covidien or influenza virus or HIV or bovine or calves or TGEV or feline or porcine or BCoV or PED or PEDV or PDCoV or FIPV or FCoV or SADS-CoV or canine or CCov or zoonotic or avian influenza or H1N1 or H5N1 or H5N6 or IBV or murine corona*).mp. [line 5 removes noise in the search results]
6. ((pneumonia or covid* or coronavirus* or corona virus* or ncov* or 2019-ncov or sars*).mp. or exp pneumonia/) and Wuhan.mp.
7. (2019-ncov or ncov19 or ncov-19 or 2019-novel CoV or sars-cov2 or sars-cov-2 or sarscov2 or sarscov-2 or Sars-coronavirus2 or Sars-coronavirus-2 or SARS-like coronavirus* or coronavirus-19 or covid19 or covid-19 or covid 2019 or ((novel or new or nouveau) adj2 (CoV on nCoV or covid or coronavirus* or corona virus or Pandemi*2)) or ((covid or covid19 or covid-19) and pandemic*2) or (coronavirus* and pneumonia)).mp.
8. COVID-19.rx,px,ox. or severe acute respiratory syndrome coronavirus 2.os.
9. ("32240632" or "32236488" or "32268021" or "32267941" or "32169616" or "32267649" or "32267499" or "32267344" or "32248853" or "32246156" or "32243118" or "32240583" or "32237674" or "32234725" or "32173381" or "32227595" or "32185863" or "32221979" or "32213260" or "32205350" or "32202721" or "32197097" or "32196032" or "32188729" or "32176889" or "32088947" or "32277065" or "32273472" or "32273444" or "32145185" or "31917786" or "32267384" or "32265186" or "32253187" or "32265567" or "32231286" or "32105468" or "32179788" or "32152361" or "32152148" or "32140676" or "32053580" or "32029604" or "32127714" or "32047315" or "32020111" or "32267950" or "32249952" or "32172715").ui. [Articles not captured by this search when created in April 2020, pending further indexing by NLM]
10. or/6-9 [Lines 6 to 9 are specific to COVID-19]
11. 5 or 10
12. 11 and 20191201:20301231.(dt).
13. remove duplicates from 12

From 20th July 2020 (Search 20) until 26th October 2020 (Search 34), all unique (following de-duplication) 'new' MEDLINE/Embase records were also scored using a binary machine learning (ML) classifier described below (see Version 35 to 44). MEDLINE/Embase records scoring above an identified threshold score were retained for screening; while those scoring below this threshold score were set aside.

Versions 35 to 44
From 9th November 2020 (Version 35) onwards, we stopped searching MEDLINE and Embase (see Searches 1-34, above) and began to identify the evidence using automated continuous prospective surveillance of the Microsoft Academic Graph (MAG) dataset [1]. A MAG-enabled workflow was operationalised using the new MAG Browser suite of tools in EPPI-Reviewer Web (ER-Web) [2]. The full MAG dataset, at this time, comprised >245 million bibliographic records of research articles on all topics across science, connected in a large network graph of conceptual, citation and author relationships [3].

Each time an updated copy of the MAG dataset was released by MicrosoftTM (initially, every two weeks), all 'new' MAG records (i.e. records of articles not indexed in any preceding versions of the MAG dataset - up to one million new records per update) and their associated metadata were automatically imported into MAG Browser (ER-Web) systems. New MAG records were then automatically scored by our novel Auto-Update machine learning (ML) recommender model.[b] The Auto-Update model exploits both network graph features [3] and text features of new MAG records (with reference to the same features of known include MAG records identified and coded in preceding versions of this Map) to score and prioritise (by ranking them from highest to lowest, by score) the new records for potential manual screening-coding (see Coding the evidence, below). Preprints, and articles from specific sources[c] that are invariably excluded from this Map, were automatically filtered out and discarded. We then retained the top scoring (on Auto-Update) new, filtered MAG records, between 3,000 and 10,000 per MAG update, contingent on the time elapsed since the preceding update and the total number of new records included in each update. From August 2020 (Version 68) onwards, we also deployed our binary ML classifier (see below) to score these new MAG records and automatically exclude those scoring below the identified threshold score (see below), before importing the remaining records (scoring at the threshold or above) into ER-Web.

Next, we used ER-Web de-duplication tools to identify and remove duplicate new MAG records. Then we re-scored the remaining top-scoring new MAG records (i.e. highest ranked by our Auto-Update model) using a binary ML classifier that we designed to distinguish between title-abstract records included in (positive class), and those excluded from (negative class), this Map.[d] New MAG records scoring above an identified threshold score on the binary ML classifier were retained; while those scoring below this threshold score were set aside. Finally, we retained the remaining new MAG records for potential manual screening and coding (see Coding the evidence).

Versions 45 to 79
From Version 45 onwards, we supplemented the set of top-scoring new records (on the ContReview model) with a second set of MAG records, from each update of the MAG dataset, identified using a COVID-19 'custom search' that we developed and executed using MAG Browser tools in ER-Web.

OR(And(W='severe',W='acute',W='respiratory',W='syndrome',W='coronavirus'),And(W='coronavirus',W='19'),And(W='coronavirus',W='2019'),And(W='covid',W='19'),And(W='covid',W='2019'),W='covid19',And(W='2019',W='ncov'),And(W='middle',W='east',W='respiratory',W='syndrome',W='coronavirus'),And(W='corona',W='virus',W='disease',W='2019'),And(W='new',W='coronavirus'),And(W='novel',W='coronavirus'),And(W='sars',W='cov2'),And(W='sars',W='cov',W='2'),And(W='sars',W='coronavirus',W='2'),Composite(F.FId=3008058167),Composite(F.FId=3007834351),Composite(F.FId=3006700255))

We restricted this custom search to MAG records, in each update of the MAG dataset, with publication dates on or after 6th July 2020, and (automatically) only imported those records that had not previously been either (a) imported into ER-Web from MAG or (b) matched from its corresponding MEDLINE/Embase record. These records were then processed into our Priority Screening workflow in precisely the same way as the set of top-scoring new MAG records on our Auto-Update model (see 'Versions 35 to 44', above), except that we have periodically updated our ML Classifier, until its current version (deployed for Version 97 - March 2022).

Versions 80 to 89
With each MAG update, four sets of records are imported. The first set was from AutoUpdate search based on COVID-19 included records. Records were discarded using the ML Classifier v3, and a key-term search (those that do not contain one of these key terms: 'COVID' OR ‘COVID19’ OR ‘nCov’ OR 'coronavirus' OR 'corona virus’ OR 'pandemic' OR ‘SARS’ OR ‘Moderna’ OR 'Ad26.COV2.S’ OR ‘BNT162b2'). The second set was the custom search, described above, and limited to the previous six months, with records discarded in the same way as for the first set. The third set was an AutoUpdate search based on records included in the Long COVID section of the Map, with records discarded using the Long COVID Classifier and the key-term search. The fourth set was a network graph search to retrieve records cited by or that cite (citations), and recommended by or that recommend (related publications) those records included in the Long COVID section of the Map, with records discarded by the Long COVID Classifier, the key-term search and a search for pre-prints and other excluded items.

Version 90 to 106 (final archived version)
From January-February, 2022, until October, 2023, we switched to automated searches of the new OpenAlex dataset [6] (which replaced and superseded MAG at the end of 2021). The OpenAlex dataset incorporates and maintains the MAG dataset (knowledge graph). OpenAlex searches were conducted using our re-engineered OpenAlex Browser tools (which have replaced MAG Browser tools) in ER-Web. The types of OpenAlex searches used to maintain the Map (AutoUpdate searches x2, custom search and network graph search) were the same as those used for the MAG dataset (see Versions 80 to 89).

Coding the evidence:
Versions 1 to 29
Prior to using the binary ML classifier score to discard low scoring records (see Identifying the evidence, above) and conducting screening-coding using priority screening mode (see Search 30 to date, below) we screened-coded all retrieved MEDLINE/Embase records in quasi-random order (i.e. without prioritisation), with weekly screening-coding assignments of varying sizes allocated between coding team members. Each coding team member could also refer selected records for a second opinion, to be resolved by team discussion and consensus.

Versions 30 to 82
From 28th September 2020 (Version 30) to March 2022 (Version 97), screening and coding of new retained MEDLINE/Embase records (up to Search 34) or MAG records (from Version 35 onwards) was conducted using priority screening mode in ER-Web. In priority screening mode, retained records (i.e. top-scoring on the AutoUpdate model and above the threshold score on the binary ML classifier - see Identifying the evidence) were screened in prioritised rank order (highest to lowest) based on scores assigned by the binary ML classifier; and the rank order of those records awaiting screening was periodically reprioritised based on all preceding coding decisions (i.e. active learning [4, 5]).

From 28th September 2020 (Version 30) to date, each team member had a fixed target to screen and code to reach a total of 1,500 records, each week, using priority screening mode. All retained records that are not screened-coded by the team in a given week were carried forward, along with new records from the next updated version of the MAG dataset (or, for Searches 30 to 34, from the next MEDLINE-Embase searches), to the pool of records to be reprioritised for screening-coding during a subsequent week. The option of referring selected records for a second opinion remained in the MAG-enabled workflow, and second opinion records were resolved by team discussion and consensus.

Versions 83 to 97
From 26th November, 2021 (MAG dataset updated on 8th November), we applied our Bidirectional Encoder Representations from Transformers (BERT) COVID-19 Categories model, which we trained, callibrated and evaluated on manually-coded records. The model was used to assign imported records to topic codes, if they scored above the probability thresholds that were calibrated to achieve over 0.95 accuracy across all codes. All remaining records were added to the pool of records for manual screening and coding, in priority-screening mode, as above.

Version 98 to 106 (final archived version)
From March 2022 (Version 97) until October 2023 (Version 106), we discontinued manual coding. We continued to import records when the OpenAlex dataset was updated, processed them as described above, and added to the Map those records that were assigned to an include 'topic' code by the BERT model.

Manual coding
The criteria to manually code records for inclusion in the Map, under each category heading (topic code), were as follows. 

  • Primary empirical data, systematic review,* modelling,** full economic evaluation,*** or novel analysis on COVID-19
    1. Treatment evaluation
      • Any intervention aimed at treatment, prevention and/or rehabilitation of COVID-19 (i.e. with either a population of COVID-19 patients or COVID-19 incidence as an outcome), including vaccines
      • Prospective outcome studies with comparison between researcher-allocated groups (i.e. randomised trials, quasi-randomised trials, and non-randomised trials with researcher allocation)
      • Include systematic reviews* that aimed to include studies meeting these criteria, whether or not any were located
      • Exclude observational/retrospective studies including treatment as an exposure and uncontrolled studies (code as Treatment development)
      • Exclude case reports with some information about treatment (code as Case reports)
      • Exclude basic science with claimed relevance to interventions, but without evaluation of effectiveness in human patients (code as Treatment development or Vaccine development)
    2. Transmission / risk / prevalence
      • Epidemiological modelling of spread (incl. studies which aim to model health outcomes or health system outcomes; include population mortality rates (i.e. deaths relative to total population); exclude case fatality rates (i.e. deaths relative to COVID-19 cases)); include genetic epidemiology if the main focus is on spread of disease (if the focus is on characterising strains, code as Genetics / biology)
      • Risk modelling
      • Studies of viral persistence in bodily secretions/tissues or in the environment, e.g. on surfaces (including methods to inactivate the virus in these contexts) 
      • Population prevalence studies (including seroprevalence)
      • Studies of risk factors for developing COVID-19 at individual level (not risks of developing more severe disease/complications among people infected with COVID-19; code as Health impacts) or at population level
      • Studies of the effectiveness of non-drug prevention strategies e.g. masks, contact tracing (only if data on prevalence reported). Include modelling studies of impacts of vaccines on prevalence at population level.
      • Exclude data on preventive behaviour outcomes only (code as Social / economic / indirect impacts). Exclude studies of disinfection, aerosolisation etc. if they do not include data on COVID-19 (code as Not on topic) 
    3. Diagnosis
      • Sensitivity and specificity of tests for COVID-19 (including antibody tests)
      • Training clinicians in diagnosis
      • Include studies of clinical signs if the main focus is on their diagnostic value (if the focus is on health outcomes or prognosis, code under Health impacts)
      • Exclude diagnosis of other conditions/comorbidities in COVID-19 patients
    4. Health impacts of COVID-19
      • Any observational study with a population of COVID-19-infected patients measuring physical health outcomes (incl. case fatality rates, QALYs or DALYs) and/or somatic indicators (code studies reporting prevalence data here if they also present health impacts)
      • Include studies of prognostic factors, indicators of disease progress or severity
      • Studies of comorbidities (e.g. coinfections), if not explicitly analysed as risk factors for infection 
    5. Vaccine development
      • Basic science aimed at development of vaccines
      • Include animal studies testing human vaccines only
      • Studies looking at vaccines but not meeting methodological criteria for Treatment evaluation, e.g. observational/retrospective studies including vaccine receipt as an exposure (even if measuring prevalence as an outcome), studies without researcher-allocated control group, or pre-post studies of antibody response
      • Studies of vaccine safety/side-effects
      • (Studies of vaccine hesitancy/intentions to be vaccinated/attitudes towards vaccination - code under Social / economic / indirect impacts)
    6. Treatment development
      • Basic science aimed at development of treatment, e.g. drug discovery (including in silico molecular docking studies)
      • Include animal studies testing human treatments only
      • Studies looking at treatments but not meeting methodological criteria for Treatment evaluation, e.g. observational/retrospective studies including treatment as an exposure, studies without researcher-allocated control group, or modelling based on evaluation data (but exclude studies of outcomes which simply state that treatment was administered without relating outcomes to treatments - code under Health impacts)
      • Studies of treatment safety/side-effects
      • Training clinicians to deliver interventions
      • Include studies of drug treatments used to prevent infection
      • (Studies of relevance to both vaccine and treatment development - code under Treatment development)       
    7. Genetics / biology of virus
      • Any data on the genetic or biological characteristics of the virus, or of mechanisms or responses to infection (including antibody responses or humoral immunity, if not clearly aimed at diagnosis/vaccine development)
      • Include modelling on the basis of secondary data analysis
      • Exclude studies of biological mechanisms theoretically linked to COVID-19 infection, but without data which actually concern COVID-19
      • If explicitly aimed at treatment (resp. vaccine) development, code as Treatment (resp. Vaccine) development; if quantifying seroprevalence, code as Transmission / risk / prevalence
    8. Case reports - patients
      • Medical case reports of small numbers of patients considered as individuals
      • Include any case with confirmed COVID-19 or symptoms or history suggestive of COVID-19 infection (otherwise code as Not on topic, case studies of health professionals and mental health consequences of lockdown are also not on topic)
      • Include mental health cases tested using the "Fear of COVID" scale or equivalent
      • Include case reports of adverse effects of vaccines
    9. Case study - organisation
      • Descriptive studies setting out organisational responses/strategies to COVID-19
      • Surveys of professionals/institutions on organisational responses (not broader knowledge or attitudes to COVID-19 - code these as Social / economic / indirect impacts); any studies focused on service delivery e.g. performance of clinical procedures pre- and post-COVID-19 (or delivery of treatments, if no outcome data)
      • Include any organisation (healthcare or other) and any form of response to COVID-19, whether directly concerning COVID-19 patients or not
      • Exclude guidance or recommendation papers which do not describe the recommended measures being implemented in a specific case
    10. Social / economic / indirect impacts
      • Include studies mainly focusing on behaviour, attitudes etc.
      • Studies of information (e.g. analysis of websites or social media)
      • Surveys of professionals if not mainly focused on organisational responses
      • Studies of behaviour or health outcomes of patients without diagnosed COVID-19 (including total excess mortality, unless separable data on COVID-19 mortality are available)
      • Studies of other impacts of COVID-19 or COVID-19 control measures (e.g. environmental impacts of lockdown)
      • Include indirect health impacts on healthcare workers (e.g. from PPE use)
      • Studies of vaccine hesitancy/intention to be vaccinated, vaccine uptake/coverage
      • Studies of access to services (vaccination, treatment, testing, etc.) for COVID-19
      • Indirect impacts of diagnostic procedures (e.g. radiation)
    11. Mental health impacts 
      • Include both COVID-19 patients and/or indirect mental health impacts on the broader population (or healthcare workers, etc.)
      • Include mental health status (anxiety, depression, etc.) and sleep-related outcomes
      • Where studies have an equal focus on mental health impacts and health and/or indirect impacts, code as mental health impacts
      • Exclude if there are no COVID-related measures and only one time point (no before-and-during or -after comparison)

* Define systematic review as any paper reporting secondary data which reports: some search terms; clearly defined inclusion criteria; and some information on the selection process (at least the number of references located by searches and the number of studies included). Include any systematic review which aimed to include studies on COVID-19, whether or not any were located. Include updates to systematic reviews and living reviews if the report presents new data and the original review meets the criteria above.
** Include modelling studies which are at least partly based on empirical data related to COVID-19 (e.g. data used as inputs to the model, or data against which the model is being calibrated or tested); code purely theoretical modelling as not primary data.
*** Include full economic evaluations (i.e. cost-effectiveness analyses, cost-minimisation analyses, cost-utility analyses, or cost-benefit analyses - see https://yhec.co.uk/resources/glossary/). Include model-based and single-study based economic evaluations. Code topic based on the main focus/ aim of the study (e.g. code cost-effectiveness analyses of clinical treatments or management strategies for COVID-19 as Treatment evaluation).

In general, code using the main aim (or the main focus) of the paper if it covers more than one topic. Code systematic reviews by the inclusion criteria or the focus of the included papers.

Two exclude codes were originally displayed in the Map, but since 21st April 2020 these are no longer shown. Since 12th February, these have been combined into one exclude code.

  1. Other viruses (SARS, MERS, etc.)
    • Anything on human coronaviruses other than COVID-19; include both primary data and non-data papers
  2. No primary empirical data, systematic review or modelling
    • Thinkpieces, non-systematic reviews, guidance, consensus statements, hypotheses, etc.
    • Protocols for studies or reviews which do not report findings data
    • Systematic reviews which do not report findings data (mapping reviews; reviews which only contain guideline documents / opinion pieces)
    • Methods papers (including validation of data collection methods if usable primary data not reported)
    • Corrections, errata, retractions
    • Responses or replies which do not report substantive new data or analysis
    • Items in data repositories (e.g. Protein Data Bank, Mendeley Data, OSF), and patents

The remaining excludes were all not on topic, pre-prints (that will be published in a journal, if they pass peer review, including all Rxiv, Research Square, Authorea, and SSRN articles), or duplicates identified while screening and coding.

To access a RIS file for any of these codes, please email theo.lorenc@york.ac.uk

Results:
Results for versions 1 to 55 are in previous versions of the Map (available on request).
Results for version 56 to 90 are in the appendix at the end of this document.

  • For Version 91. published on 17th February, 2022, we coded 908 records: 724 were added to the Map, 83 were excluded, and 101 were not on topic. The remaining 7,062 records were carried forward.
  • For Version 92, published on 24th February, 2022, we coded 1,134 records: 778 were added to the Map, 169 were excluded, and 187 were not on topic. The remaining 5,928 records were carried forward.
  • On 3rd March, 2022, we imported four sets of records from the OpenAlex dataset updated 22nd February, 2022. In total, we imported 30,186 records, of which 1,452 were duplicates, 10,806 were discarded after scoring by ML classifiers, and 9,828 were discarded after searches for selected keywords or stop-words. Of the remaining 8,100 records, 5,736 were automatically assigned to topic codes (for publication in Version 93 of the map), by applying the BERT model. The other 2,364 records were added to the 5,928 unscreened records, making 8,292 records to be screened.
  • For Version 93, published on 4th March, 2022, we coded 282 records: 165 were added to the Map, 51 were excluded, and 66 were not on topic. Of the 5,736 records that were coded using the BERT model, 807 were excluded, 291 were not on topic and 4,638 were added to the Map. The remaining 8,010 records were carried forward.
  • For Version 94, published on 10th March, 2022, we coded 959 records: 755 were added to the Map, 87 were excluded, and 117 were not on topic. The remaining 7,051 records were carried forward.
  • On 17th March, 2022, we imported two sets of records from the OpenAlex dataset updated 28th February, 2022. In total, we imported 14,662 records, of which 87 were duplicates, 3,745 were discarded after scoring by ML classifiers, and 7,507 were discarded after searches for selected keywords or stop-words. Of the remaining 3,323 records, 2,513 were automatically assigned to topic codes (for publication in Version 95 of the map), by applying the BERT model. The other 810 records were added to the 7,051 unscreened records, making 7,861 records to be screened.
  • For Version 95, published on 18th March, 2022, we coded 840 records: 509 were added to the Map, 177 were excluded, and 154 were not on topic. Of the 2,513 records that were coded using the BERT model, 218 were excluded, 202 were not on topic and 2,093 were added to the Map. The remaining 7,021 records were carried forward.
  • On 18th March, 2022, we imported a third set of records from the custom search of the OpenAlex dataset updated on 28th February, 2022. There were 2,743 records, of which 115 were duplicates, 1,284 were discarded after scoring by ML classifiers, and 24 were discarded after searches for selected keywords and stop-words. Of the remaining 1,320 records, 758 were automatically assigned to topic codes (for publication in Version 96 of the Map) by applying the BERT model. The other 562 records were added to the 7,021 unscreened records, making 7,583 records to be screened.
  • For Version 96, published on 24th March, 2022, we coded 989 records: 533 were added to the Map, 225 were excluded, and 231 were not on topic. Of the 758 records that were coded using the BERT model, 225 were excluded, 27 were not on topic and 506 were added to the Map. In addition, eight records that were identified by another project were added to the Map; 43 records that were not primary data were removed from the Map, and six were removed from the screening pool. The remaining 6,588 records were carried forward.
  • For Version 97, published on 31st March, 2022, we coded 733 records: 532 were added to the Map, 110 were excluded, and 91 were not on topic. The remaining 5,855 records were carried forward.
  • On 26th April, 2022, we imported three sets of records from the OpenAlex dataset updated 11th March, 2022, comprising a total of 7,565 records, of which 152 were duplicates, 2,348 were discarded after scoring by ML classifiers, and 1,886 were discarded after searches for selected keywords and stop-words. Of the remaining 3,179 records, 2,312 were automatically coded by applying the BERT model, and the other 867 records were set aside.
  • For Version 98, published on 26th April, 2022, of the 2,312 records that the BERT model automatically coded, 2,003 were included in the Map, 236 were excluded, and 73 were not on topic. Fifty records were manually coded; 43 were added to the Map, five were excluded, and two were not on topic.
  • On 22nd September, we imported four sets of records into ER-Web from automated searches of multiple, consecutive updates of the OpenAlex dataset from March 2022 up to 31st August 2022, comprising a total of 116,014 records, of which 1,799 were duplicates, 49,145 were discarded after scoring by machine learning classifiers, and 13,021 were discarded after searches for selected keywords and stop-words. Of the remaining 52,049 records, 36,763 were automatically coded based on results from applying the BERT model, and the other 15,286 records were set aside.
  • For Version 99, published on 19th October, 2022, of the 36,763 automatically coded records, 32,105 records were included and added to the Map, and the other 4,658 records were excluded or not on topic.
  • In January 2023, we imported into ER-Web 86,708 records from automated searches of multiple, consecutive updates of the OpenAlex dataset from September 2022 up to 4th January 2023. Of these, 1,883 were duplicates, 51,548 were discarded after scoring by machine learning classifiers, and 4,546 were discarded after searches for selected keywords and stop-words. Of the remaining 28,731 records, 21,103 were automatically coded based on results from applying the BERT model, and the other 7,628 records were not automatically coded.
  • For Version 100, published on 17th January, 2023, of the 21,103 automatically coded records, 18,410 records were included and added to the Map, and the other 2,693 records were excluded or not on topic.
  • On 7th March 2023, we imported 55,482 records from automated searches of the February 2023 update ('snapshot') of the OpenAlex dataset, from 5th January 2023 up to 21st February 2023 (released 24th February 2023), into ER-Web. Of these, 700 were duplicates, 20,742 were discarded after scoring by machine learning classifiers, and 7,828 were discarded after searches for selected keywords and stop-words. Of the remaining 26,212 records, 19,128 were automatically coded based on results from applying the BERT model, and the other 7,084 records were not automatically coded.
  • For Version 101, published on 10th March, 2023, of the 19,128 automatically coded records, 16,805 records were coded as included and added to the Map, and the other 2,323 records were coded as excluded.
  • On 31st March 2023, we imported 24,228 records from automated searches of the March 2023 update ('snapshot') of the OpenAlex dataset, from 22nd February 2023 up to 31st March 2023 (released 31st March 2023), into ER-Web, of which 241 were duplicates, 5,667 were discarded after scoring by machine learning classifiers, and 9,845 were discarded after searches for selected keywords and stop-words. Of the remaining 8,475 records, 6,286 were automatically coded based on results from applying the BERT model, and the other 2,189 records were not automatically coded. 
  • For version 102, published on 3rd May, 2023, of the 6,286 automatically coded records, 5,531 records were coded as included and added to the Map, and the other 755 records were coded as excluded.
  • On 16th May 2023, we imported 28,267 records from automated searches of the 3rd May 2023 update ('snapshot') of the OpenAlex dataset (released on 9th May 2023), from 1st April 2023 up to 3rd May 2023, into ER-Web, of which 907 were duplicates, 7,201 were discarded after scoring by machine learning classifiers, and 7,973 were discarded after searches for selected keywords and stop-words. Of the remaining 12,186 records, 8,881 were automatically coded based on results from applying the BERT model, and the other 3,305 records were not automatically coded.
  • For version 103, published on 18th May, 2023, of the 8,881 automatically coded records, 7,720 records were coded as included and added to the Map, and the other 1,161 records were coded as excluded.
  • On 9th June 2023, we imported 23,167 records from either (i) automated searches of the 2nd June 2023 update ('snapshot') of the OpenAlex dataset (released on 5th June 2023), from 4th May 2023 up to 2nd June 2023, or (ii) additional searches for Long COVID systematic reviews (up to January 2023) or treatment evaluations (up to March 2023), into ER-Web. Of these, 539 were duplicates, 6,957 were discarded after scoring by machine learning classifiers, and 7,173 were discarded after searches for selected keywords and stop-words. Of the remaining 8,498 records, 6,272 were automatically coded based on results from applying the BERT model, 2,222 records were not automatically coded, and 4 Long COVID treatment evaluations were retained.
  • Of 6,272 automatically coded records, 5,482 records were coded as included and added to the Map for publication in Version 104 (published on 13th June 2023), and the other 790 records were coded as excluded. Four retained Long COVID treament evaluations were also added to the Map for publication in Version 104.
  • On 14th July 2023, we imported 32,226 records from automated searches of the 12th July 2023 update ('snapshot') of the OpenAlex dataset, from 3rd June 2023 up to 11th July 2023, into ER-Web. Of these, 794 were duplicates, 9,836 were discarded after scoring by machine learning classifiers, and 8,596 were discarded after searches for selected keywords and stop-words. Of the remaining 13,000 records, 9,553 were automatically coded based on results from applying the BERT model, and the other 3,447 records were not automatically coded.
  • For Version 105 - 12th July 2023, published on 19th July 2023, of the 9,553 automatically coded records, 8,227 records were coded as included and added to the Map, and the other 1,326 records were coded as excluded.
  • On 7th and 18th October 2023, we imported 67,154 records from two sources into ER-Web: (i) automated searches of the OpenAlex dataset, from 13th July 2023 up to 5th October 2023; and (ii) additional searches for long COVID systematic reviews or treatment evaluations (up to July 2023). Of these records, 4,968  were duplicates, 19,275 were discarded after scoring by machine learning classifiers, and 21,324 were discarded after searches for selected keywords and stop-words. Of the remaining 21,587 records, 8 were pre-coded as 'treatment evaluations', 3 records were manually coded, 15,640 were automatically coded based on results from applying the BERT model, and the other 5,936 records were not automatically coded.
  • Of 15,651 pre-coded, manually coded or automatically coded records, 13,293 records were coded as included and added to the map for publication in 'Version 106 - 10th October 2023' (published on 24th October 2023), while the other 2,358 records were coded as excluded.

Endnotes:
a Primary empirical data, systematic review, modelling, full economic evaluation, or novel analysis on COVID-19 - see Coding the evidence.
b Our ContReview model was built and tested (in collaboration with MicrosoftTM) using MAG records of COVID-19 research articles that we had matched to records included in the Map up to Search 19 (and which are therefore also indexed in MEDLINE and/or Embase).
c New Scientist, The Conversation, NEJM Journal Watch, Veterinary Record, Chemical & Engineering News and Physics Today.
d The binary ML classifier was built and tested in ER-Web using MAG records of COVID-19 research articles that we had matched to records included in, and excluded from, our Map up to Search 19 (and which are, therefore, also indexed in MEDLINE and/or Embase). It was calibrated to achieve at least 0.95 recall among MEDLINE-Embase records included in the Map, with a corollary workload reduction of around 30% (compared with screening all MEDLINE-Embase records).

Suggested citation for this Map:
Lorenc T, Khouja C, Raine G, Shemilt I, Sutcliffe K, D'Souza P, Burchett H, Hinds K, Khatwa M, Macdowall W, Melton H, Richardson M, South E, Stansfield C, Thomas S, Kwan I, Wright K, Sowden A, Thomas J (2020) COVID-19: living map of the evidence. London: EPPI-Centre, Social Science Research Unit, UCL Social Research Institute, University College London.

Funding:
This Map was commissioned by the National Institute for Health Research (NIHR) Policy Research Programme (PRP) for the Department of Health and Social Care (DHSC) and Public Health England (PHE). It was funded through the NIHR PRP contract with the EPPI Centre at UCL (Reviews facility to support national policy development and implementation, PR-R6-0113-11003). Any views expressed in this publication are those of the author(s) and not necessarily those of the NHS, the NIHR or the DHSC.

Conflicts of interest:
None.

Contributions:
Any opinions expressed in this publication are not necessarily those of the EPPI-Centre or the funders. Responsibility for any views expressed remains solely with the authors.

References:

  1. Sinha A, Shen Z, Song S, Ma H, Eide D, Hsu B-J, Wang K. An Overview of Microsoft Academic Service (MA) and Applications. In Proceedings of the 24th International Conference on World Wide Web (WWW '15 Companion): 243-246. ACM, New York, NY, USA. 2015. https://academic.microsoft.com/paper/1932742904  
  2. Thomas J, Graziosi S, Brunton J, Ghouze Z, O'Driscoll P, Bond M (2020). EPPI-Reviewer: advanced software for systematic reviews, maps and other evidence synthesis [Software]. https://eppi.ioe.ac.uk/CMS/Default.aspx?alias=eppi.ioe.ac.uk/cms/er4  
  3. Shemilt I, Thomas J. MAG-Net-ise it! How the use of Microsoft Academic Graph with machine learning classifiers can revolutionise study identification for systematic reviews. Oral paper accepted for presentation at the 26th Cochrane Colloquium, Santiago, Chile, 22-25 October 2019. https://colloquium2019.cochrane.org/abstracts/mag-net-ise-it-how-use-microsoft-academic-graph-machine-learning-classifiers-can  
  4. Miwa M, Thomas J, O'Mara-Eves A, Ananiadou S. Reducing systematic review workload through certainty-based screening. Journal of Biomedical Informatics 2014; 51: 242-253. https://academic.microsoft.com/paper/2099883114 
  5. O'Mara-Eves A, Thomas J, McNaught J, Miwa M, Ananiadou S. Using text mining for study identification in systematic reviews: a systematic review of current approaches. Systematic Reviews 2015; 4: 5. https://academic.microsoft.com/paper/2147469877  
  6. Chawla DS. Massive open index of scholarly papers launches [News]. Nature 2022, 24th January. https://explore.openalex.org/works/W4207002271

Copyright:
Authors of the systematic reviews on the EPPI-Centre website http://eppi.ioe.ac.uk hold the copyright for the text of their reviews. The EPPI-Centre owns the copyright for all material on the website it has developed, including the contents of the databases, manuals, and keywording and data-extraction systems. The centre and authors give permission for users of the site to display and print the contents of the site for their own non-commercial use, providing that the materials are not modified, copyright and other proprietary notices contained in the materials are retained, and the source of the material is cited clearly following the citation details provided. Otherwise users are not permitted to duplicate, reproduce, re-publish, distribute, or store material from this website without express written permission.

The NIHR Policy Research Programme Reviews Facility is a collaboration between the following:

UCL Social Science Research Institute

EPPI-Centre (Evidence for Policy and Practice Information and Co-ordinating Centre)

London School of Hygiene & Tropical Medicine

University of York, Centre for Reviews and Dissemination

Appendix:

  • For Version 56, published on 6th May, 2021, we coded 1,507 records: 973 were added to the Map, 229 were excluded and 305 were not on topic. The remaining 74,496 records were carried forward.
  • On 11th May, 2021, we imported two sets of records from the MAG dataset updated on 26th April. The first set contained 9,432 records, of which 101 were duplicates and 3,034 were discarded, using the new ML classifier, leaving 6,297 records. The second set contained 4,156 records, of which 59 were duplicates and 1,990 were discarded, leaving 2,107 records. In total, 8,404 records were added to the pool of 74,496 unscreened records, making 82,900 records to be coded.
  • For Version 57, published on 13th May, 2021, we coded 1,502 records: 1,456 were added to the Map, 36 were excluded and 10 were not on topic. The remaining 81,398 records were carried forward.
  • For Version 58, published on 20th May, 2021, we coded 1,520 records: 1,033 were added to the Map, 193 were excluded, and 294 were not on topic. The remaining 79,878 unscreened records were carried forward.
  • On 25th May, 2021, we imported two sets of records from the MAG dataset updated on 10th May. The first set contained 9,902 records, of which 154 were duplicates and 2,988 were discarded, using the new ML classifier, leaving 6,760 records. The second set contained 4,670 records, of which 67 were duplicates and 2,143 were discarded, leaving 2,460 records. In total, 9,220 records were added to the pool of 81,398 unscreened records, and three additional duplicates were removed, making 89,095 records to be coded.
  • For Version 59, published on 27th May, 2021, we coded 1,500 records: 1,327 were added to the Map, 85 were excluded and 88 were not on topic; one duplicate was removed from the Map. The remaining 87,595 records were carried forward.
  • On 9th June, 2021, we imported two sets of records from the MAG dataset updated on 24th May. The first set contained 9,609 records, of which 181 were duplicates and 2,998 were discarded, using the new ML classifier, leaving 6,430 records. The second set contained 4,965 records, of which 100 were duplicates and 2,267 were discarded, leaving 2,598 records. In total, 9,028 records were added to the pool of 87,595 unscreened records, and three additional duplicates were removed, making 96,620 records to be coded.
  • For Version 60, published on 10th June, 2021, we coded 1,761 records: 1,391 were added to the Map, 191 were excluded and 179 were not on topic. The remaining 94,859 records were carried forward.
  • For Version 61, published on 17th June, 2021, we coded 1,272 records: 1,135 were added to the Map, 87 were excluded and 50 were not on topic. The remaining 93,587 records were carried forward.
  • For Version 62, published on 24th June, 2021, we coded 1,741 records: 1,157 were added to the Map, 332 were excluded and 252 were not on topic. One duplicate was removed from the screening pool. The remaining 91,845 records were carried forward.
  • On 24th June, 2021, we imported two sets of records from the MAG dataset updated on 7th June. The first set contained 10,216 records, of which 128 were duplicates and 3,297 were discarded, using the new ML classifier, leaving 6,791 records. The second set contained 2,659 records, of which 34 were duplicates and 1,326 were discarded, leaving 1,299 records. In total, 8,090 records were added to the pool of 91,845 unscreened records, making 99,935 records to be coded.
  • On 29th June, 2021, we imported an additional set of 1,530 records, of which none were duplicates and 517 were discarded, leaving 1,013 records. These were added to the pool, making 100,948 records to be coded.
  • On 30th June, 2021, to reduce the size of the screening pool, we removed 16,036 records with classifier scores of less than 0.50, leaving 84,912 records to be coded.
  • For Version 63, published on 1 July, 2021, we coded 1,500 records: 874 were added to the Map, 301 were excluded and 325 were not on topic. The remaining 83,412 records were carried forward.
  • On 7th July, 2021, we imported two sets of records from the MAG dataset updated on 22nd June. The first set contained 6,641 records, of which 160 were duplicates and 1,752 were discarded, using the new ML classifier, leaving 4,729 records. The second set contained 10,170 records, of which 252 were duplicates and 4,251 were discarded, leaving 5,667 records. In total, 10,396 records were added to the pool of 83,412 unscreened records, making 93,808 records to be coded.
  • For Version 64, published on 8th July, 2021, we coded 1,539 records: 1,445 were added to the Map, 44 were excluded and 50 were not on topic . The remaining 92,269 records were carried forward.
  • For Version 65, published on 15th July, 2021, we coded 1,528 records: 1,230 were added to the Map, 111 were excluded and 187 were not on topic . The remaining 90,741 records were carried forward.
  • On 21st July, 2021, we imported two sets of records from the MAG dataset updated on 22nd June. The first set contained 10,419 records, of which 169 were duplicates and 3,336 were discarded, using the new ML classifier, leaving 6,914 records. The second set contained 6,000 records, of which 142 were duplicates and 2,750 were discarded, leaving 3,108 records. In total, 10,022 records were added to the pool of 90,741 unscreened records, making 100,763 records to be coded.
  • For Version 66, published on 22nd July, 2021, we coded 1,520 records: 1,088 were added to the Map, 202 were excluded and 230 were not on topic . The remaining 99,243 records were carried forward.
  • On 27th July, 2021, to reduce the size of the screening pool, we removed 9,921 records with classifier scores of less than 0.50, leaving 89,322 records to be coded.
  • For Version 67, published on 29th July, 2021, we coded 1,521 records: 1,342 were added to the Map, 78 were excluded and 101 were not on topic . The remaining 87,801 records were carried forward.
  • On 4th August, 2021, we imported two sets of records from the MAG dataset updated on 19th July, 2021. The first set contained 7,784 records, of which 108 were duplicates and 580 were discarded, using the new ML classifier, leaving 7,096 records. The second set contained 5,320 records, of which 86 were duplicates and 2,934 were discarded, leaving 2,300 records. In total, 9,396 records were added to the pool of 87,801 unscreened records, making 97,197 records to be coded.
  • From 4th to 6th August, we removed 70,952 records with classifier scores of less than 0.80, leaving 26,245 records to be coded.
  • On 18th August, 2021, we imported two sets of records from the MAG dataset updated on 2nd August, 2021. The first set contained 8,879 records, of which 104 were duplicates and 576 were discarded using the ML classifier, leaving 8,199 records. The second set contained 6,531 records, of which 123 were duplicates and 3,535 were discarded, leaving 2,873 records. In total, 11,072 records were added to the pool of 26,245 unscreened records, making 37,317 records to be coded.
  • For Version 68, published on 19th August, 2021, we coded 1,533 records: 1,265 were added to the Map, 157 were excluded and 111 were not on topic. The remaining 35,784 records were carried forward.
  • For Version 69, published on 26th August, 2021, we coded 1,523 records: 1,108 were added to the Map, 154 were excluded and 261 were not on topic. The remaining 34,261 records were carried forward.
  • For Version 70, published on 2nd September, 2021, we coded 1,521 records: 1,367 were added to the Map, 61 were excluded and 93 were not on topic. The remaining 32,740 records were carried forward.
  • On 7th September, 2021, we imported two sets of records from the MAG dataset updated on 16th August, 2021. The first set contained 440 records, of which 9 were duplicates and 30 were discarded, using the new ML classifier, leaving 401 records. The second set contained 11,131 records, of which 193 were duplicates and 3,980 were discarded, leaving 6,958 records. In total, 7,359 records were added to the pool of 32,740 unscreened records, making 40,099 records to be coded.
  • For Version 71, published on 9th September, 2021, we coded 1,421 records: 1,249 were added to the Map, 68 were excluded and 104 were not on topic. The remaining 38,678 records were carried forward.
  • On 14th September, 2021, we imported two sets of records from the MAG dataset updated on 30th August, 2021. The first set contained 3,733 records, of which 14 were duplicates and 428 were discarded, using the new ML classifier, leaving 3,291 records. The second set contained 10,536 records, of which 136 were duplicates and 3,794 were discarded, leaving 6,606 records. In total, 9,897 records were added to the pool of 38,678 unscreened records, and one duplicate was removed, making 48,574 records to be coded.
  • For Version 72, published on 16th September, 2021, we coded 1,866 records: 1,630 were added to the Map, 104 were excluded and 132 were not on topic. The remaining 46,708 records were carried forward.
  • For Version 73, published on 23rd September, 2021, we coded 1,466 records: 1,070 were added to the Map, 187 were excluded and 209 were not on topic. The remaining 45,242 records were carried forward.
  • On 28th September, 2021, we imported two sets of records from the MAG dataset updated on 13th September, 2021. The first set contained 8,270 records, of which 132 were duplicates and 533 were discarded, using the new ML classifier, leaving 7,605 records. The second set contained 5,408 records, of which 59 were duplicates and 2,930 were discarded, leaving 2,419 records. In total, 10,024 records were added to the pool of 45,242 unscreened records, making 55,266 records to be coded.
  • For Version 74, published on 30th September, 2021, we coded 1,524 records: 1,366 were added to the Map, 63 were excluded and 95 were not on topic. The remaining 53,742 records were carried forward.
  • On 1st October, 2021, we removed from the pool of 53,742 unscreened records, 26,516 records that scored <0.80 on our binary ML classifier (v3), leaving 27,226 records to be screened.
  • For Version 75, published on 7th October 2021, we coded 1,446 records: 1,331 were added to the Map, 38 were excluded and 77 were not on topic. We also added 19 records that were identified by a review on long COVID, and were not in the screening pool. The remaining 25,780 records were carried forward.
  • On 7th October, 2021, we searched MAG, updated on 13th September 2021, for any records related to those in the Map that have been tagged as long COVID. We imported 2,134 records, of which 329 were duplicates, and 848 were discarded using a new ML classifier designed to identify studies on long COVID, leaving 957 records that scored over the calibrated threshold score on this classifier. These were added to the unscreened records, making 26,737 records to be screened.
  • On 11th October, 2021, we removed 203 preprints from the pool of unscreened records, leaving 26,534 records to be screened.
  • On 12th October, 2021, we imported three sets of records from the MAG dataset updated on 27th September, 2021. The first set contained 8,806 records, of which 59 were duplicates and 1,551 were discarded, using the existing ML classifier, leaving 7,196 records. The second set contained 4,645 records, of which 32 were duplicates and 2,647 were discarded using the ML classifier, leaving 1,966 records. The third set contained 2,103 records, of which 7 were duplicates and 522 were discarded using a second, new ML classifier designed to identify studies on Long Covid, leaving 1,574 records. In total, 10,736 records were added to the pool of 26,534 unscreened records, making 37,270 records to be coded.
  • For Version 76, published on 14th October 2021, we coded 1,489 records: 1,386 were added to the Map, 40 were excluded and 63 were not on topic. The remaining 35,781 records were carried forward.
  • For Version 77, published on 21st October 2021, we coded 1,644 records: 1,198 were added to the Map, 136 were excluded and 310 were not on topic. The remaining 34,137 records were carried forward.
  • On 22nd October, 2021, we removed from the pool of 34,137 unscreened records, 7,283 records that scored <0.80 on our binary ML classifier (v3), leaving 26,854 records to be screened.
  • On 27th October, 2021, we imported three sets of records from the MAG dataset updated on 11th October, 2021. The first set contained 6,817 records, of which 28 were duplicates and 1,806 were discarded, using the ML classifier, leaving 4,983 records. The second set contained 9,610 records, of which 59 were duplicates and 3,463 were discarded using the ML classifier, leaving 6,088 records. The third set contained 1,234 records, of which 1 was a duplicate and 278 were discarded using the ML classifier, leaving 955 records. In total, 12,026 records were added to the pool of 26,854 unscreened records, making 38,880 records to be coded.
  • On 28th October, 2021, we imported 274 additional records, missed from the previous upload; 2 were duplicates, 152 were discarded and 120 were added to the pool, making 39,000 records to be coded.
  • For Version 78, published on 28th October 2021, we coded 1,538 records: 1,051 were added to the Map, 106 were excluded and 381 were not on topic. The remaining 37,462 records were carried forward.
  • On 2nd November, 2021, we removed 15,399 records that did not contain a keyword (see Identifying the evidence, Version 80 to date above) making 22,063 records to be coded.
  • For Version 79, published on 4th November 2021, we coded 1,624 records: 1,059 were added to the Map, 161 were excluded and 404 were not on topic. The remaining 20,439 records were carried forward.
  • On 5th November, 2021, we added 67 records that had been removed, but contained keywords, and 63 records that had been missed from version 53, making 20,569 records to be coded.
  • On 10th November, 2021, we imported four sets of records from the MAG dataset updated on 25th October, 2021. The first set contained 5,830 records, of which 42 were duplicates and 4,395 were discarded, leaving 1,393 records. The second set contained 8,772 records, of which 45 were duplicates and 2,957 were discarded, leaving 5,770 records. The third set contained 1,815 records, of which 6 were duplicates and 1,700 were discarded, leaving 109 records. The fourth set contained 3,293 records, of which 603 were duplicates, and 2,392 were discarded leaving 298 records. In total, 7,570 records were added to the 20,569 unscreened records, making 28,139 records to be screened.
  • For Version 80, published on 11th November 2021, we coded 1,278 records: 1,222 were added to the Map, 46 were excluded and 10 were not on topic. The remaining 26,861 records were carried forward.
  • For Version 81, published on 18th November 2021, we coded 1,731 records: 1,500 were added to the Map, 169 were excluded and 62 were not on topic. Three records that were identified while searching PROSPERO for completed reviews on Long COVID, were imported and added to the Map. The remaining 25,130 records were carried forward.
  • For Version 82, published on 25th November 2021, we coded 1,505 records: 1,365 were added to the Map, 105 were excluded and 35 were not on topic. The remaining 23,625 records were carried forward.
  • On 26th November, 2021, we imported three sets of records from the MAG dataset, updated on 8th November, 2021, comprising a total of 18,599 records, of which 110 were duplicates, 5,395 were discarded using the ML classifiers, and 4,598 were removed using the key-word search. We used the BERT model to automatically code 5,778 records for the Map, and the remaining 2,718 were added to the pool of 23,625 unscreened records, making 26,343 records to be screened. We then removed 11,516 records that scored less than 0.80 on our binary ML classifier v3, leaving 14,827 records to be screened.
  • For Version 83, published on 9th December, 2021, we coded 1,782 records: 1,640 were added to the Map, 94 were excluded and 48 were not on topic. The remaining 13,045 records were carried forward. Of the 5,778 records that were coded by the BERT model, 772 were excluded, 247 were not on topic and 4,759 were added to the Map.
  • On 10th December 2021, we applied the BERT model to the pool of 13,045 unscreened records. Based on the results, 9,656 were automatically assigned to topic codes (for Version 84 of the Map), leaving 3,389 records to be screened.
  • On 10th December 2021, we imported four sets of records from the MAG dataset, updated on 22nd November 2021. In total, we imported 21,764 records, of which 767 were duplicates, 728 were pre-prints and 12,639 were discarded after scoring by ML classifiers (7,707) or supplementary keyword searches (4,932). We applied the BERT model to the remaining 7,630 records; 5,354 were automatically assigned to topic codes (for Version 84 of the Map), and the other 2,276 records were added to the pool of unscreened records, making a total of 5,665 records to be screened.
  • For Version 84, published on 16th December, 2021, we coded 1,255 records: 968 were added to the Map, 183 were excluded, and 104 were not on topic. As 34 records had already been removed from the pool, using the BERT model, 4,444 records remained and were carried forward. Of the 9,656 records from the screening pool that were coded by the BERT model, 34 had already been coded manually, 1,490 were excluded, 316 were not on topic and 7,816 were added to the Map. Of the 5,354 imported records that were coded by the BERT model, 647 were excluded, 218 were not on topic and 4,489 were added to the Map.
  • For Version 85, published on 6th January, 2022, we coded 751 records: 520 were added to the Map, 128 were excluded, and 103 were not on topic. The remaining 3,693 records were carried forward.
  • On 7th January, 2022, we imported three sets of records from the MAG dataset updated on 6th December, 2021. In total, we imported 17,522 records, of which 108 were duplicates and 8,728 were discarded after scoring by machine learning classifiers (5,277) or supplementary keyword searches (3,451). We applied the BERT model to the remaining 8,686 records; 5,817 were automatically assigned to topic codes (for publication in Version 86 of the Map), and the other 2,869 records were added to the 3,693 unscreened records, making 6,562 records to be screened.
  • For Version 86, published on 13th January, 2022, we coded 1,280 records: 1,088 were added to the Map, 131 were excluded, and 61 were not on topic. Of the 5,817 records that were coded by the BERT model, 846 were excluded, 223 were not on topic and 4,748 were added to the Map. The remaining 5,282 records were carried forward.
  • For Version 87, published on 20th January, 2022, we coded 1,220 records: 689 were added to the Map, 379 were excluded, and 152 were not on topic. The remaining 4,062 records were carried forward.
  • For Version 88, published on 27th January, 2022, we coded 1,509 records: 1,043 were added to the Map, 190 were excluded, and 276 were not on topic. The remaining 2,553 records were carried forward.
  • For Version 89. published on 3rd February, 2022, we coded 1,147 records: 606 were added to the Map, 269 were excluded, and 272 were not on topic. The remaining 1,406 records were carried forward.
  • On 6th February, 2022, we imported three sets of records from the OpenAlex dataset updated 30th January, 2022. In total, we imported 47,857 records, of which 3,429 were duplicates, 14,029 were discarded after scoring by ML classifiers, and 6,195 were discarded after searches for selected keywords. Of the remaining 24,204 records, 16,961 were automatically assigned to topic codes (for publication in Version 90 of the map), based on the results from applying the BERT model. The other 7,243 records were added to the 1,406 unscreened records, making 8,649 records to be screened.
  • For Version 90. published on 10th February, 2022, we coded 679 records: 647 were added to the Map, 19 were excluded, and 13 were not on topic. Of the 16,961 records that were coded by the BERT model, 2,921 were excluded, 426 were not on topic and 13,614 were added to the Map. The remaining 7,970 records were carried forward.
No coded records
Publications by year
Maps(3D) & Crosstabs(2D)
Selected node: N\A
EPPI-Vis is developed and maintained by the EPPI-Centre. The data shown is retrieved in real time from the EPPI-Reviewer database.