posted on November 15, 2017 11:53
Dylan Kneale and Antonio Rojas-García reflect on recent work exploring the use of evidence in local public health decision-making. In new climates of public health decision-making, where the local salience of research evidence becomes an even more important determinant of its use, they question how much research is being wasted because it is not generalisable in local settings.
Our review on evidence use in local public health decision-making cast a spotlight on patterns and drivers of evidence use in England. Locality was the recurring theme running throughout the review: evidence was prioritised where its salience to the local area was easily identified. Local salience included whether the evidence was transferable to the characteristics of the population, the epidemiological context, whether the proposed action was feasible (including economically feasible), but also included the political salience of the evidence.
To some, particularly those working in public health decision-making, these findings may feel all too familiar. The shift of decision-making to local government control in 2013 has served to increase the (local) politicisation of public health, which has consequent impacts on the way in which public health challenges are framed, the proposed actions to tackle these challenges, and the role of evidence in this new culture of decision-making. But these findings help to reinforce other lessons for generators evidence, because in many ways the review highlighted a thriving evidence use culture in local public health, but one that has a ‘make-do and mend’ feel about it.
Locality emerged as a key determinant of evidence use, but one where generators of evidence have only loosely engaged. There are likely good reasons for this, with research funders and publishers encouraging researchers to produce evidence that is relevant to audiences worldwide. However, this status quo means that in many ways, evidence generators may be guilty of adopting something akin to an ‘information deficit model’ around the use of research evidence – which inherently assumes that there is an information gap that can be plugged by ever-more peer-reviewed research or systematic reviews that reflect ‘global’ findings. This perspective overlooks epistemological issues around whether such knowledge is applicable, transferable, or otherwise relevant and useful in the context of local decision-making. It also assumes that decision-making is an event, rather than a cumulative learning process where outcomes are evaluated in different ways by different stakeholders; this perspective was reinforced in our own review by a paucity of studies that engaged with the nitty-gritty of public health decision-making processes.
The expression of a make-do and mend evidence culture is that public decision-makers working in Local Authorities take what they can from existing evidence sources (make-do), and plug the gaps with programs of local research and evaluation (mend), a finding that is also supported in later strands of our work (forthcoming). The fact that Local Authorities are undertaking their own research and evaluation activities to understand local people’s needs and preferences is, in many ways, to be commended. However, it also raises questions around the utility of public health research and how much public health research is being ‘wasted’, or rather its utility is not being maximised, through poor take-up in decision-making. Furthermore our review shows that, the methodological quality of this local research and evaluation activity is relatively unknown, and there exists a convention for much of this locally-conducted research to focus on public health needs and not on the effectiveness, or likely effectiveness, of action(s). Finally, an inward focus, with research being conducted in one site and not being shared across others, impedes on the dissemination of learning between other similar contexts.
So what’s to be done? As a principle, it is clear that there needs to be more regular dialogue between evidence generators/synthesisers and evidence users to understand decision-making needs and processes. New developments such as a dedicated journal on research stakeholder involvement are likely to advance our knowledge on effective ways of working and developing relationships across boundaries. But new methods are also needed to help understand the parameters of generalisability of research that is based on national or global level research and how (or if) it relates to local areas. We also need to strengthen our knowledge around what effective knowledge translation approaches look like. Our own review found that ‘experts’ were frequently deployed, potentially in a knowledge translation role, although their characteristics and precise function was unclear, and it is welcome to see a review further exploring knowledge translation competencies underway. Finally we need to explore how we can better support Local Authorities to conduct the type of locally focussed research that they are currently producing, and appear to find the most useful, and to better understand the characteristics and features of this research. Supporting any one of these activities involves ensuring that funding exists, on both sides of the evidence generator/user divide, to foster relationships and to support the development of new methodological advances. But knowledge exchange should be integral to the role of all researchers working in public health. Social scientists in academia spend approximately 9 per cent of their time on knowledge exchange activities; in contrast we spend 23 per cent of our time on administration activities! More of our time needs to be protected and spent on knowledge exchange activities if we are to better understand and respond to evidence needs in public health, and ensure that applied public health research can actually be applied.
The findings of our review reflected the literature on England, although devolution and localism in health and public health is a feature of many countries. However, another trend is occurring in English public health that is likely to perpetuate a make-do and mend evidence use culture. Public health spending is shrinking with, for example, an expected £85 million slashed from sexual health and substance abuse by cash strapped Local Authorities in 2017/18 alone. In times of shrinking budgets, the judicious use of evidence should become all the more important in offering greater access to information on what works, increased opportunities for the effective use of resources, and improved certainty around the likelihood of success. However unless we understand, and can better communicate, the generalisability of research evidence to decision-makers working locally, academic research evidence may fail to make the contribution to public health decision-making that it could, and indeed should, make.
About the Authors
Dylan Kneale is a Research Officer at the EPPI-Centre. He is interested in developing methods to enhance the use of evidence in decision-making, focusing on demography, public health and social exclusion.
Antonio Rojas-García is a Research Associate-Systematic Reviewer at the UCL Department of Applied Health Research. Prior to joining UCL, he has been working on several research projects, mostly focused on health inequalities and health systems. Within UCL, he has been part of a number of projects exploring the use of research evidence, the impact of delayed discharges on patient, staff and health systems, among others.
1. Kneale D, Rojas-García A, Raine R, Thomas J: The use of evidence in English local public health decision-making. Implementation Science, 2017, 12(1):53.
2. Marteau TM, Sowden AJ, Armstrong D (eds.): Implementing research findings into practice: beyond the information deficit model. BMJ Publishing; 2002.
3. Wolfenden L, Ziersch A, Robinson P, Lowe J, Wiggers J.: Reducing research waste and improving research impact. Australian and New Zealand journal of public health, 2015, 39(4):303-4.
4. Mallidou AA, Atherton P, Chan L, Frisch N, Glegg S, Scarrow G: Protocol of a scoping review on knowledge translation competencies. Systematic reviews, 2017, 6(1):93.
5. Bullock A, Hughes R: Knowledge Exchange and the Social Sciences: A Report to ESRC from the Centre for Business Research. Cambridge: University of Cambridge; 2016.
6. Ham C, Timmins N: Managing health services through devolved governance: A perspective from Victoria, Australia. London: The King's Fund; 2015.
7. Gulland A. Spending on public health cut as councils look to save money. BMJ: British Medical Journal (Online), 2017 13:358.
Image Source (© 1939-1946, public domain).
posted on November 09, 2017 11:00
Sandy Oliver discusses whether the worlds of academia and humanitarianism can combine to improve the delivery and understanding of the processes and benefits of humanitarian aid through use of evidence.
Interest is growing, when making decisions within the humanitarian sector, in drawing on systematic reviews of studies that assess the effects of different policies or practices. As in other sectors, such research evidence is considered alongside what else is known, such as about competing priorities, social norms, available resources or ease of implementing a programme. Professor Sandy Oliver argues that in contexts where rigorous studies of effects are few and far between, perhaps because conducting research is difficult in such circumstances, it is useful to learn from systematic reviews that encompass other forms of knowledge that are commonly held by individuals and organisations delivering humanitarian aid. These broader systematic reviews increasingly come from partnerships of academics and humanitarian organisations. Strengthening the links between research and work in the field helps create evidence-informed policy/practice, and policy/practice-informed evidence.
Finding evidence of effects
Humanitarian crises, whether ‘man-made’ or following natural disasters, seem overwhelming not only to the people caught up in them, but also to the individuals and organisations wanting to help. How do we know what will help? Humanitarian workers have first-hand experience. Well established organisations have tried and tested procedures. Researchers offer findings from studies designed to understand the nature and scale of the challenges being faced, to compare the effects of different programmes or systems, and to investigate how aid is delivered by organisations and individuals. This knowledge, whatever its source, is widely dispersed and can be difficult to find. One solution is to seek research findings to inform policy and practice by drawing on systematic reviews of research findings. There is growing enthusiasm for systematic reviews in the humanitarian sector. As with any new movement there are new challenges. We are encountering some of these challenges as we help to inform the efforts of humanitarian organisations and evidence champions who try to improve the lives of displaced populations in areas of conflict.
An immediate challenge for humanitarians is how to access systematic reviews quickly. We have found two particularly valuable types of resources: database of systematic reviews maintained by Evidence AID and 3ie; and evidence gap maps prepared by 3ie and the International Rescue Committee.
These resources signpost systematic reviews offering high quality evidence (often from controlled trials) of ways to improve access to health care, such as through involving communities in planning, implementing and or monitoring health service, or by training traditional birth attendants (Obuku et al., 2017). Similarly, we can find high quality evidence showing the benefits of cash or vouchers that give families access to food (Doocy and Tappis, 2016); and WASH (Water, Sanitation and Hygiene) interventions for reducing the spread of disease (Yates et al., 2017).
When evidence of effects is lacking
Particularly challenging, humanitarian organisations tell us, are the logistics and organisation required to deliver supplies and coordinate multiple agencies. Learning how to address these challenges requires different sorts of studies. Frustratingly, not only are there fewer systematic reviews addressing these challenges, but those that exist find relatively little relevant evidence.
In some cases systematic reviews tell us more about what not to do. Efforts addressing violence against women and girls have to be made in the absence of counterfactual studies offering evidence about the effects of prevention programmes. Instead, findings of observational or qualitative studies warn of potential harms. Livelihood programmes that raise women’s earned income may lead to increases in sexual violence, which is already common in areas of conflict. Women can experience: coercion by partners to hand over their earnings; increased risk when collecting firewood to sell; and insecure employment where lack of regulation exacerbates vulnerability (Spangaro et al., 2013). Where justice systems are weak encouraging survivors of sexual violence to testify can be counterproductive. Rather than perpetrators being convicted, survivors may experience retaliation, stigma and ostracism (Fulu et al., 2014). Although they do not cite the research, international guidelines recognise these challenges (Inter-Agency Standing Committee (2015).
In this way, where uncertainties about effective policies or practices are high, and the available research is meagre, learning has come from systematic reviews that address how organisations work, not only the effects of their work. For instance, where research has not addressed the effects of different models or mechanisms for coordinating humanitarian efforts (Akl et al., 2015), learning is available from a systematic review of studies describing how organisations have worked together (Lotfi et al., 2016). The most common model is clusters of national and international humanitarian agencies liaising around the main humanitarian sectors (such as food security, WASH or education) and within a country under the overall leadership of a humanitarian coordinator. They are designated by the Inter-Agency Standing Committee (IASC) and have clear responsibilities for coordination.
Knowledge from practice, organisational procedures and research
Conventionally research knowledge is seen as distinct from organisational or practice knowledge. However, the boundaries are becoming blurred as researchers involve humanitarian organisations in designing their studies, and conversely as humanitarian organisations take the lead in conducting or commissioning research. Examples of the former are the research team in Beirut systematically reviewing coordination of health services (Akl et al., 2015). They began their work by convening key stakeholders to discuss key questions needing answers. Similarly, reviewers addressing sexual violence in conflict and post-conflict zones convened an advisory group of consultants and programme directors with in-depth knowledge in the field. (Spangaro et al., 2013). Both reviews benefited from having direct involvement of people well-placed to inform the project and feed the findings directly into policy processes and practice. Examples of the latter are Oxfam and UNICEF. Both these organisations have commissioned or conducted systematic reviews to inform their own humanitarian work. Oxfam, in partnership with the Feinstein International Centre, has its own humanitarian evidence programme. UNICEF’s Baby Friendly Initiative is supported by systematic reviews of the effects of infant healthcare where much of the evidence comes from randomised controlled trials. In the area of peacebuilding, where RCTs are rare, UNICEF’s systematic review conducted in partnership with academics assesses how educational interventions may contribute to enhancing the agency of youth as peacebuilders (Lopes Cardozo et al., 2015).
Areas of work that are least developed have most to gain from systematic reviews conducted with or by humanitarian agencies who are able to draw on internal reports of the agencies themselves. This is well illustrated by UNICEF’s systematic review of humanitarian programming and monitoring in inaccessible conflict settings (Chaudhri et al., 2017). This review identified only 14 academic papers, largely observational studies. These were supplemented by many more organizational guidelines, programming notes, program evaluations, presentations, and meeting notes which, between them, offered concepts and definitions, and addressed the preliminary steps required to advance this burgeoning field.
Strengthening the evidence ecosystem
Evidence-informed humanitarian aid has had a low profile. Yet many of the essential components mentioned above are already in place. Systematic reviews are readily accessible through databases and evidence-gap maps. Guidelines are developed by consensus at international, regional and national levels. Academics and humanitarian agencies work together to pool resources from their different worlds. The next step to developing an evidence ecosystem for humanitarian aid may be strengthening the links between these components: guidelines informed by evidence; and humanitarian-academic partnerships shaping evidence syntheses. A likely mechanism for doing so could well be the existing system of clusters. The Inter-Agency Standing Committee already designates the clusters, and develops guidance for agencies working in the field and for inter-agency working. The potential for bringing evidence and guidance closer together is apparent in the area of gender based violence, where an overview of the evidence (The Global Women’s Institute and International Rescue Committee 2016) mentions the Inter-Agency Standing Committee (2015) guidance. An even closer relationship between guidance and evidence is illustrated by a systematic review in the area of mental health, (Bangpan et al., 2017) which employed the IASC’s (2007) intervention framework to present the available evidence. Lastly, the potential role for clusters is apparent from the systematic review of humanitarian work in inaccessible conflict settings, which was supported by the World Health Organisation as the Health Cluster Lead Agency and drew on guidance and research evidence (Chaudhri et al., 2017).
Learning how to deliver humanitarian aid, and about the effects it has, benefits from applying the principles of systematic reviewing to a broad range of documents that report research, policies, practices or discussions. Although the field of Evidence-based Humanitarian interventions is relatively young, all the basic components that can allow the emergence of a specialised evidence ecosystem are already in place. At this stage, it seems crucial to focus on inter-sector collaboration and cross fertilisation of ideas. In this way academics and humanitarian agencies working together can strengthen the links between their two worlds to develop further evidence-informed humanitarian aid.
About the image: Internally displaced person’s camp in Dili, Timor-Leste, 2006. © Anthony Zwi.
About the author
Professor Sandy Oliver is the Director of the Social Science Research Unit at UCL, and Deputy Director of the EPPI-Centre.
Akl EA, El-Jardali F, Bou Karroum L, El-Eid J, Brax H, Akik C, et al. (2015) Effectiveness of Mechanisms and Models of Coordination between Organizations, Agencies and Bodies Providing or Financing Health Services in Humanitarian Crises: A Systematic Review. PLoS ONE10(9). DOI: 10.1371/journal.pone.0137159.
Bangpan M, Dickson K, Felix L, Chiumento A. (2017). The impact of mental health and psychosocial support interventions on people affected by humanitarian emergencies: A systematic review. Humanitarian Evidence Programme. Oxford: Oxfam GB.
Chaudhri S, Cordes K, Miller N (2017) Humanitarian programming and monitoring in inaccessible conflict settings: A Literature Review. WHO - Health Cluster.
Doocy S, Tappis H. (2016) Cash-based approaches in humanitarian emergencies: a systematic review. 3ie Systematic Review Report 28. London: International Initiative for Impact Evaluation (3ie)
Fulu E, Kerr-Wilson A, Lang J. (2014) What works to prevent violence against women and girls? Evidence Review of interventions to prevent violence against women and girls. DFID, London.
Inter-Agency Standing Committee (2007) IASC Guidelines on Mental Health and Psychosocial Support in Emergency Settings. Geneva: IASC.
Inter-Agency Standing Committee (2015) Guidelines for Integrating Gender-Based Violence Interventions in Humanitarian Action: Reducing risk, promoting resilience and aiding recovery. Geneva: IASC.
Lopes Cardozo MTA, Higgins S, Maber E, Brandt CO, Kusmallah N, Le Mat MLJ. (2015) Literature Review: Youth Agency, Peacebuilding and Education, Research Consortium Education and Peacebuilding, University of Amsterdam.
Lotfi T, Bou-Karroum L, Darzi A, Hajjar R, El Rahyel A, El Eid J, Itani M, Brax H, Akik C, Osman M, Hassan G, El-Jardali F, Akl E. (2016) Coordinating the Provision of Health Services in Humanitarian Crises: a Systematic Review of Suggested Models. PLOS Currents Disasters. Edition 1. DOI: 10.1371/currents.dis.95e78d5a93bbf99fca68be64826575fa.
Obuku EA, Stewart R, Mijumbi R, Ochan M, Achana F, Akena D, Nakitende A, Ssemata A, Kinengyere A, Semakula D, Ssenono R, Nsangi A, Lalitha R, Mwesiga E, Akite J, Basaza R, Newbrander W, Okello J, Sewankambo N, with Dickson K, Oliver S. (2017) Working with non-state providers in post-conflict and fragile states in primary healthcare service delivery: a systematic review. London: EPPI-Centre, Social Science Research Unit, UCL Institute of Education, University College London.
Spangaro J, Zwi A, Adogu C, Ranmuthugala G, Davies GP, Steinacker L (2013) What is the evidence of the impact of initiatives to reduce risk and incidence of sexual violence in conflict and post-conflict zones and other humanitarian crises in lower and middle-income countries? A systematic review. London: EPPI-Centre, Social Science Research Unit, Institute of Education, University of London.
The Global Women’s Institute and International Rescue Committee. (2016) Evidence brief: What works to prevent and respond to violence against women and girls in conflict and humanitarian settings? Washington DC: George Washington University and London: IRC.
Yates, T., Allen, J., Leandre Joseph, M. and Lantagne, D. (2017). WASH interventions in disease outbreak response. Humanitarian Evidence Programme. Oxford: Oxfam GB.
posted on November 08, 2017 14:18
Many humanitarians are evidence-aware, but may find it difficult to draw on what is known or find knowledge that speaks to their context. They may also be pressed for time to find or judge the relevance of what is often a dispersed literature. To address this gap the Humanitarian Evidence Programme, a partnership between Oxfam and Feinstein International Center at Tufts University, published eight systematic reviews in areas identified as a priority by humanitarian policy and practitioner stakeholders. Typical of the sector, and similar to international development, decision-makers ask very broad questions. Kelly Dickson and Mukdarut Bangpan reflect on the challenges we encountered when producing a mixed methods evidence synthesis for this programme, on mental health and psychosocial programmes for people affected by humanitarian emergencies.
Broad policy questions need comprehensive and flexible conceptual frameworks.
The humanitarian sector is a diverse field of inquiry. It can include earthquakes and typhoons, protracted conflict, acts of terrorism, biological hazards, and other natural and man-made disasters. Our brief, to consider the effectiveness and implementation of programmes addressing the mental health and psychosocial well-being of both children and adults, was similarly broad. To guide the review, we needed a comprehensive and flexible conceptual framework which was extensive enough to capture the diversity of definitions that can be found, but also specific enough to guide our reviewing task; from searching, screening, to shaping the synthesis. To achieve this, we drew heavily on existing research literature to provide us with widely recognised definitions of humanitarian emergencies and outcomes (e.g. post-traumatic stress syndrome, depression, anxiety, social support). We were also able to draw on existing guidelines on Mental Health and Psychosocial Support in Emergency Settings, which had benefitted from stakeholder input from United Nations agencies, NGOs and academic institutions. These guidelines provided an intervention framework, in the form of a 'layered system of MHPSS support', which was used to map and signpost studies in the review and supported useful identification of gaps in the evidence-base.
Emerging research areas need tailored search strategies…
Like for most systematic reviews we aimed to conduct a comprehensive search of the literature to identify the best available evidence to answer the review questions. We found that, similar to previous reviews, relying on bibliographic databases is not sufficient. Although some areas of research, such as the effects of clinical treatment, are well indexed in databases, other areas, such as studies of delivery and receipt of mental health programmes, were more often found by hand-searching, a time consuming but productive activity.
…and inclusive but transparent approaches to judging rigour and relevance.
One of the defining characteristics of systematic reviews is not only the transparent and accountable methods used to identify relevant literature, but the judgements that are made about the quality, and thus trustworthiness of the evidence to inform policy and practitioner decision-making. These judgements can be based on standardised tools for judging the methodological quality of evidence or those designed specifically for this review. A common criticism of systematic reviews, and a concern for the humanitarian evidence programme, was the potential to miss important lessons from studies of lower quality. Our approach was to two-fold. The first was to retain all studies, but to ‘grade’ the evidence informing the question on impact taking into consideration study quality rating, number of studies, and effect sizes, before making summary statements on the direction of effect (e.g. there is strong, moderate or limited evidence). The second was to judge the relevance of studies with qualitative evidence informing our question on implementation, clearly signposting this through the evidence synthesis. This approach, draws on our institutional learning, and addressed concerns from policy and practitioners keen to use evidence.
Systematic reviews shaped by humanitarian organisations are best shared through humanitarian organisations.
Not only did we draw on questions and concepts defined by humanitarian organisations, but we have also found them essential for sharing the findings. Our interpersonal relationships and networks (see Langer et al., 2016) have increased the visibility of the review findings. Through working closely with others as part of a ‘community of practice’ we were able to discuss the current evidence base and implications for the future, and found websites, conferences and webinars to suit wider policy and practice networks.
Producing mixed-methods evidence synthesis in a new arena such as the humanitarian evidence sector has presented us with an exciting opportunity to apply our skills and adapt synthesis methodology to benefit the field. This has required immersing ourselves in the policy literature to develop an appropriate conceptual framework, drawing on our technical expertise to identify and appraise studies and building new social networks to support dissemination activities. Doing so has, once again, enabled us draw on our insider knowledge of working at the interface of policy and practice to relevant and accessible evidence. The result is evidence where both relevance and rigour are maximised to suit decisions made in challenging circumstances.
About the authors
Mukdarut Bangpan is a research officer at the EPPI-Centre with many years' experience of conducting systematic review and supporting review teams, particularly in the area of international development.
Kelly Dickson is Research at the EPPI-Centre, experienced in conducting mixed methods systematic reviews and building research synthesis capacity across disciplines, including international development. She is also a psychotherapist with an interest in mental health and psychosocial wellbeing of adults and children.
Bangpan M, Lambert F, Chiumento A, Dickson K (2016). The impact of mental health and psychosocial support programmes for populations affected by humanitarian emergencies: a systematic review protocol. Oxford: Oxfam.
Interagency standing committee (2007). IASC guidelines on mental health and psychosocial support in emergency settings. Inter-Agency Standing Committee. https://interagencystandingcommittee.org/
Langer L, Tripney J, Gough D (2016). The science of using science: researching the use of Research evidence in decision-making. London: EPPI-Centre, Social Science Research Unit, UCL Institute of Education, University College London
posted on February 10, 2017 17:15
The replication crisis, publication bias, p-hacking, harking, bad incentives, undesirable pressures and probably other factors all contribute to diminish the trustworthiness of published research, with obvious implications for research synthesis. Sergio Graziosi asks whether demanding simple theoretical clarity might be part of the solution.
Within all the talk about publication bias, p-hacking, the replication crisis and so forth, I am finding it harder and harder to keep track of all the proposed solutions. While trying to organise my thoughts, I have realised that the absence of theoretical clarity underlies many of the problems that are currently being discussed. Perhaps this realisation is enough to justify a slight change in focus. For systematic reviewing, as Mark has reminded us, figuring out what results should be trusted, and perhaps more importantly, finding auditable and reasonably objective ways to do so is, naturally, of paramount importance. I do not think I need to convince anyone about this latter point, and will take it for granted in what follows.
More than ten years after warning us that most of the published research is false, Ioannidis (with colleagues) has produced a manifesto for reproducible science (Munafò et al. 2017). It is well worth a read, but it did not soothe my disquiet and confusion. On one hand, the manifesto comes with a range of concrete, actionable and agreeable suggestions. On the other, the same suggestions are, to my eyes, already worrying: the value of each remedial measure is likely to depend on how robust its implementation can be. Let’s consider pre-registration: it is a very attractive strategy and I am sure it is already contributing to diminish practices such as harking and p-hacking. However, on closer examination, one can find the contribution from Wicherts et al., where they list all degrees of freedom that researchers may exploit (consciously or not, legitimately or not) in their search for “significant” results. The list includes 34 potential problems, framing their discussion around the pitfalls that should be avoided when relying on preregistration. Ouch.
Checking for signs of 34 separate questionable practices when reviewing a single study in conjunction with its preregistration looks already daunting and close to utopian – especially when one remembers that the authors’ interest is to paint their own research in the most positive light. How many researchers are likely to critically consider each relevant pitfall of each step of their own workflow, and do so at the right time?
On the other side of the fence, to compile systematic reviews, one would need to go through the same checklist for all studies considered, and perhaps check the consistency of decisions across multiple reviewers. If I extrapolate, and assume that each of the twenty-plus strategies proposed in Munafò’s manifesto comes with a similar number of ways to fail to fully deliver its own potential (even if this doesn’t entail a combinatorial explosion, as there are many overlaps), my mind vacillates and immediately starts looking for strategies that come with lower cognitive costs.
What I will propose is indeed a shortcut. A (hopefully handy) heuristic that revolves around the role of theory in primary research. My starting point is a concise list of typical research phases (up-to and excluding research synthesis as such), being mindful that many alternatives exist. The table below may be read as a simplified version of the list produced by Wicherts et al., compiled with two underlying objectives: keeping it manageable, and highlight the role of theory. My main hunch is that when one clarifies the role played by theory in a given research phase, pitfalls, dangers and malpractice may become easier to isolate. You may decide to read what follows as an argument advocating for epistemological clarity in scientific reporting.
|Research phase & role of theory
|Theory building: this is typically done to try to accommodate the evidence that isn’t satisfactorily accounted-for by existing theories.
||Identify a need: as anomalies are accumulating, people start asking “do we need an entirely new theory?”
Historically, theories such as electromagnetism. More recently, the creation of countless classifications of psychological ‘types’.
|1. Fail to account for all/enough available evidence.
2. Fail to realise how current evidence may fit in existing frameworks.
3. Give new names/labels to existing concepts; fail to appreciate how existing theories use different labels to point at similar concepts or mechanisms.
4. Fail to capture regularities, which directly depend on non-contingent causal chains.
1: No new theory can expect to account for all evidence from day zero.
2: That’s how theories degenerate: what if a new theory can accommodate more evidence with less ad-hoc extensions?
3: Existing theories are confusing, imprecise, too broad or too narrow.
4: This can be established only post-hoc. One needs to first theorise and then check that predictions do apply. Only then can one focus on causal explanations (secondary hypotheses).
|Draft a new theory.
|Formulate new hypotheses: within a theoretical framework, many separate hypotheses can be identified.
||Data exploration: find patterns in existing data.
||Analysis and re-analyses of longitudinal studies
1. Spurious correlations.
2. Pattern-finding bias (we tend to see patterns in noise).
3. Mistaking homogeneity for random noise (the opposite of pattern-finding).
4. Survivorship bias.
|These pitfalls are irrelevant, because hypotheses need to be tested anyway.
||Deductively explore the consequences of a given theory, i.e. recalculation of expected light-bending effect of gravity as a result of general relativity.
1. Logic failures and/or lack of imagination.
2. Overconfidence, producing hypotheses that are too far removed from tested-theory.
3. Lack of ambition: producing ever more detailed hypotheses, just to get publishable positive results.
1-2: as above.
3: this is how “normal science” is done!
||Test an hypothesis.
||Measure the effect of a drug.
1. Bad/insufficient clarity on what is tested.
2. Bad experimental design.
3. Low power.
4. Measure proliferation (encourages p-hacking).
5. Unpublished negative results / publication bias.
1: that’s science, people can’t know it all already.
2-4: budget/capacity. Science happens in the real world, we can do what we can do.
5: ditto, can’t spend ages trying to publish results that no-one wants to read.
|Make predictions – applied science.
||Build bridges, design new microprocessors
1. Overconfidence: stretching a theory beyond its known/expected limits of applicability.
2. Failure to account for theoretical boundaries (not knowing when/why a theory stops to apply).
3. Failure to measure outcomes.
1: But, but, science told us it will work!
2: Can’t anticipate unknown unknowns.
3: We don’t need to, because our theory is solid.
The interesting part of this exercise is how many of the known problems are not, or are only marginally captured by the table above – I would argue that a good number fall in the cracks between the cells above. Thus, my point is that clarifying what one is doing (am I producing a new hypothesis? Am I testing a given one? Am I trying to see if we should start looking for new possible theories?) should be second-nature for all scientists (but alas, I don’t think it is). This may make it easier to double check for well-known pitfalls, but also to avoid stumbling on the boundaries between separate tasks. For example, P-hacking and HARKing can be deliberate malpractice, or could result from “Bad/insufficient clarity on what is tested”. However, it seems to me that it may also be caused by a lack of separation between hypothesis testing and data-exploration.
For example, we may imagine our typical scientist: in this imaginary scenario, her normal work starts by testing a hypothesis – to keep it simple, we’ll assume she is testing the effectiveness of drug D. Let’s imagine she finds a tiny effect size, but with a considerable number of outliers which seem to be clustered together. The original experiment was testing a hypothesis: first result is that drug D doesn’t appear to work. However, we now have new data to explore (a different task), and perhaps we can find that the outliers all have trait T in common. The same experiment therefore yielded a second (separate!) result: we now have a new hypothesis - perhaps drug D only works on subjects with T.
One study yielded two “results”, one is negative or inconclusive; the second is a new hypothesis. Being a new hypothesis, it needs to be tested. In our oversimplified example, the data suggests a new hypothesis, and therefore it can’t also confirm it.
In other words, perhaps we can agree that clarifying and segregating tasks based on how they relate to theory has helped identifying a well-known problem with unreliable science, and has concurrently made it clear how to best use the data collected. Most scientific work actually happens across multiple phases, but nevertheless, having a clear conceptual picture of the boundaries could be a useful approach to avoid repeating well-known mistakes. It goes without saying that such conceptual clarity, if transferred in published research articles, also has the potential of making the task of systematic reviewers less prone to error and less dependent on hard to audit personal judgements.
Is this simplistic proposal enough to overcome all the problems mentioned above? Of course not. It is a mere heuristic; a simple rule of thumb, which I hope might be useful to our readers. If time permits, I hope to explore this same theme in the context of conducting systematic reviews in a follow-up post. In my studies (molecular biology and neuroscience), no one ever helped me realise how the role of theory relates with the different mistakes that may plague scientific results. In fact, no one ever discussed the epistemological foundations of science; I guess they were mostly taken for granted. Thus, perhaps my suggestion is directed to educators in particular: discussing and clarifying the distinctions I’ve mentioned here might be a low-cost strategy to help the next generation of scientists not to repeat our own mistakes.
About the author
Sergio Graziosi is the EPPI-Centre IT manager, and one of the developers of EPPI-Reviewer. Designing tools to conduct systematic reviews implicitly requires exploring what can and cannot count as good/reliable evidence. As a consequence, he’s been exploring science epistemology on his own (non-academic) blog.
Kerr NL (1998). HARKing: hypothesizing after the results are known. Personality and social psychology review : an official journal of the Society for Personality and Social Psychology, Inc, 2 (3), 196-217 PMID: 15647155
Head ML, Holman L, Lanfear R, Kahn AT, & Jennions MD (2015). The extent and consequences of p-hacking in science. PLoS biology, 13 (3) PMID: 25768323
Ioannidis JP (2005). Why most published research findings are false. PLoS medicine, 2 (8) PMID: 16060722
Munafò, M., Nosek, B., Bishop, D., Button, K., Chambers, C., Percie du Sert, N., Simonsohn, U., Wagenmakers, E., Ware, J., & Ioannidis, J. (2017). A manifesto for reproducible science Nature Human Behaviour, 1 (1) DOI: 10.1038/s41562-016-0021
Wicherts JM, Veldkamp CL, Augusteijn HE, Bakker M, van Aert RC, & van Assen MA (2016). Degrees of Freedom in Planning, Running, Analyzing, and Reporting Psychological Studies: A Checklist to Avoid p-Hacking. Frontiers in psychology, 7 PMID: 27933012
Image Credits: © Munafò et al., Nature Publishing Group (CC-BY).
posted on December 23, 2016 09:53
[Warning: do not read this with small kids around!] Mark Newman poses some questions in theme with the seasonal festivities: what does it mean to believe in Father Christmas? Does it really differ that much from belief in the role of evidence? We at the EPPI-Centre are happy to rise to the occasion and wish all of our readers a very Merry Christmas and a happy and prosperous New Year.
This festive time of year provides ample cause to reflect on the nature of ‘belief’. After all, there is a lot of ‘believing’ going on at this time of year. Believing that Father Christmas brings your presents down the chimney for example. Now my kids are old enough I can go public and say I don’t believe in Father Christmas. I don’t ‘believe’ most of the foundation myths of Christmas actually, yet I will still be celebrating Christmas with family and friends; happily participating in Christmas rituals like singing carols, putting stockings by the fire on Christmas eve and so on. This got me thinking about what we mean when we say ‘I believe’ or ‘I don’t believe’. Is ‘belief’ (or not) about Christmas the same thing as ‘belief’ or not about the meaning of evidence?
In one way I think you say that yes, we can and do use the term ‘believe’ in different ways in different contexts. Talking about ‘believing’ in the context of faith, myth, tradition, shared communal social norms is meaningfully different to talking about ‘believing’ in the context of a discussion about the interpretation of research evidence. But of course, as an advocate of the greater use of research evidence to inform decision making I would say that, wouldn’t I. So I think it is important to recognise that actually there are quite specific ways in which I might be using the term ‘believe’ in the same way in both contexts.
The claim that that Father Christmas came down the chimney to bring your presents is a ‘knowledge claim’. Therefore I can ask what is the warrant for that knowledge claim. A warrant is provided by some combination of theory, empirical research evidence and personal experience(1). When I say that I do not believe that Santa came down the chimney to bring your Christmas presents I am saying that the theory, empirical evidence and personal experience do not provide a warrant for that knowledge claim. This is what we are saying when we talking about believing or not believing the evidence. Does the warrant provided by the combination of theory, empirical research evidence and personal experience support the knowledge claim made by the researchers?
So no, I don’t believe in Father Christmas but I am still looking forward to seeing what he brings on Christmas day and enjoying all the festivities of the season. I hope you all do too.
About the Author
Mark Newman is a Reader in Evidence informed Policy and Practice at UCL Institute of Education and an Associate Director of the EPPI-Centre. He has a particular interest in evidence use in the context of the Education and Training of healthcare professionals. He will be celebrating Christmas in London where he lives with his two children.
1. James, M., Pollard, A., Rees, G., & Taylor, C. (2005). Researching learning outcomes: building confidence in our conclusions Curriculum Journal, 16 (1), 109-122 DOI: 10.1080/0958517042000336863
posted on December 09, 2016 16:53
It is conventional in the social sciences to report p-values when communicating the results of statistical analyses. There are, however, increasing criticisms of the p-value for being open to misinterpretation and – worse – at risk of falsely indicating the presence of an effect. Alison O’Mara-Eves considers a further problem: failing to engage readers with the meaning behind the numbers. Some alternative ways of reporting the results of analyses are considered.
In the social sciences, statistical analyses are regularly used to test hypotheses and interrogate the collected data. The typical output of such analyses is a mean, correlation, or other statistical value that represents some trend in the data – causal relations, similarities, or differences. This output is a summary or representation of what we have observed over the collected data, or a value for which we can infer will also represent other samples from the same population. Attached to that summary statistic or inferential statistic is usually a p-value.
Statistical p-values are often represented in published reports as asterisks, the number of which tells the reader something about the p-value. Generally, a p-value of less than or equal to .05 is represented by *, whilst =.01 is usually **, and =.001 is usually ***. Whilst most readers of research might not reflect too much on what the numbers mean, the reader will typically get more excited by ‘more asterisks’ (assuming that they are hoping for a statically significant outcome).
You might have noticed that I did not define the p-value but instead launched into the description of the asterisks. This is because this is how many readers (and many study authors) process p-values — i.e., rather superficially. Whilst the audience generally knows the rule of thumb that a p-value less than .05 is ‘significant’, study authors often fail to explain what the actual question underlying the significance test means.
Such ‘black box’ approaches to communicating statistics do not allow the audience to really engage with the research findings: by waving our hand and saying “trust me that it’s important”, the reader does not have a good understanding of how or why the numbers are important, which makes it harder for the reader to determine the relevance of the findings to their own informational needs. Indeed, “p-values characterize only statistical significance, which bears no necessary relationship to practical significance or even to the statistical magnitude of the effect” (Lipsey et al., 2012, p. 3).
Most commonly, the significance value relates to a test of whether there is support for the null hypothesis that there is no observed effect or relationship beyond chance, so a significant result typically means that—statistically speaking—we can reject that null hypothesis. But this is not the same as saying that the observed effect is meaningful and it does not tell us about any variation (e.g., does the observed effect apply to all cases?).
I hasten to add that there are other reasons why we might wish to abandon the p-value (or at least complement it with additional information). Lipsey et al. (2012) argue: “Statistical significance is a function of the magnitude of the difference between the means, to be sure, but it is also heavily influenced by the sample size, the within samples variance on the outcome variable, the covariates included in the analysis, and the type of statistical test applied” (p. 3). Several papers have discussed other statistical reasons why a p-value can be misinterpreted or lead to a false positive result (i.e., the analyses detect an effect that is not actually present). Particularly insightful and/or impactful papers on this issue include Colquhoun (2014) and Ioannidis (2005). At least one journal has made the bold move to ban the p-value significance test because of statistical concerns; see the Royal Statistical Society item discussing this ban.
So what are other ways of engaging the reader in interpreting your statistical results? Here are a few starting suggestions, but there are certainly others.
- Effect sizes and confidence intervals. Effect sizes focus on the magnitude and direction of the effect, while confidence intervals encourage correct interpretation (e.g., see Cumming 2013), perhaps because they require the reader to think about the range of possible values that an observed effect can take. It should be noted, however, that there are also ways to make effect sizes more interpretable for different audiences (e.g., see Thomas, Harden, & Newman, 2012).
- Converting back to the original metric. This involves presenting the findings in terms of what one would actually observe ‘in the real world’. For example, an intervention aimed at increasing vegetable intake could present the findings in terms of how many additional pieces of vegetables the average participant would consume after the intervention. This approach emphasises practical significance over statistical significance.
- Exploring variation. Whilst a mean effect or a correlation representing the strength of a relation is interesting, there is perhaps not enough attention paid to variation. Variation is the extent to which different data points (e.g., the responses from individuals) differ from the ‘average’ or ‘typical’ respondent. Some analyses might explore outliers and exclude or truncate them so that they do not unduly influence the analyses, but perhaps there is more that we could be doing with this information. The ‘variants’ could be particularly interesting to practitioners and decision-makers, rather than just being statistical nuisances. For instance, they could help us understand how the finding might apply to different people in our sample (and by inference, our population). Focusing on variation could be as simple as plotting the data points so that the reader can see how the individual data points differ from the mean or predicted values, or it could be more complex, involving subgroup and other statistical analyses to try to explain the variation. (Although note that this should not be seen as an endorsement of practices that lead to data dredging or p-hacking; see Simmons et al. (2011) for a definition. Explorations of variation should be purposive, well-justified, and, ideally, pre-specified).
In conclusion, the “seductive but illusory certainty of a p-value cutoff” (Cumming, 2013, p. 12) is problematic for more than just statistical reasons. It discourages researchers and their audiences from truly thinking about what the significance test is testing. Moreover, beyond the initial excitement of discovering “yay – it’s statistically significant!”, audiences are not likely to be fully engaged by these values because the practical implications of the results are not always clear. Interpreting the results in terms of the likely ‘real-world’ implications or the variation in the dataset will help practitioners and decision-makers decide how the finding might apply to their context.
About the author:
Alison O’Mara Eves is a Senior Researcher at the EPPI-Centre, Social Science Research Unit, UCL Institute of Education. She specialises in methods for systematic reviews and meta-analysis, and has been conducting systematic reviews for over 13 years. In this capacity, she has reviewed many thousands of primary studies, as well as conducting statistical analyses of her own, which has made her acutely aware of the challenges of communicating findings from statistical analyses. Her profile and publications can be found here.
Colquhoun D. (2014) An investigation of the false discovery rate and the misinterpretation of p-values. Royal Society Open Science, 1, 140216.
Cumming G. (2013). The new statistics: why and how. Psychological Science, 25, 7-29.
Ioannidis JP. (2005). Why most published research findings are false. PLoS Medicine, 2, e124.
Lipsey, M.W., Puzio, K., Yun, C., Hebert, M.A., Steinka-Fry, K., Cole, M.W., Roberts, M., Anthony, K.S., Busick, M.D. (2012). Translating the statistical representation of the effects of education interventions into more readily interpretable forms. (NCSER 2013-3000). Washington, DC: National Center for Special Education Research, Institute of Education Sciences, U.S. Department of Education.<
Simmons, J. P., Nelson, L. D., & Simonsohn, U. (2011). False-positive psychology: undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychological Science, 22(11), 1359–1366
Thomas J, Harden A, and Newman M. (2012). Synthesis: Combining results systematically and appropriately. In Gough, Oliver, and Thomas (eds.), An introduction to systematic reviews. London: Sage.
Image Credits: © Hilda Bastian (CC-BY-NC-ND).
posted on December 05, 2016 15:20
Gillian Stokes and Sergio Graziosi - Blog Editors.
This is the launch post for the new EPPI-Centre blog: we provide a brief introduction of the topics we are planning to cover and the general aims of the blog.
Welcome to the EPPI-Centre blog
Those of you who have worked with the EPPI-Centre, or have read one of our many publications over the last 23 years of operation, will recognise the EPPI-Centre as an organisation that provides internationally impactful evidence on matters of health, social care, international development, education and policy. For those of you who are unfamiliar with who we are, the Evidence for Policy and Practice Information and Co-ordinating Centre, or EPPI-Centre, is part of the Social Science Research Unit at the Department of Social Science, UCL Institute of Education, University College London.
We are committed to informing policy and professional practice with rigorous evidence. Our two main areas of work are systematic reviews and research use. Our systematic review work comprises a variety of research endeavours that include: developing research methods for systematic reviews and research syntheses, conducting reviews, supporting others to undertake reviews, and providing guidance and training in this area. With regard to research use, we study the use/non-use of research evidence in personal, practice and political decision-making, support those who wish to find and use research to help solve problems in a wide variety of disciplines; and to do this we further provide guidance and training.
Why are we opening a new blog?
One of our defining interests revolves around public engagement, and we are keen to open new channels of communication, especially if they cut across the boundaries of the academic “ivory tower”. We have been conducting open seminars since the beginning of 2015, which have been excellently received. Speakers include researchers from the EPPI-Centre, as well as researchers from a wide range of world-class institutions. (Click here for forthcoming events, or see an overview of our past seminars and associated resources). Furthermore, since April 2011, we have provided headlines about our work via our Twitter feed. Twitter has proven useful to engage a large audience, however, we felt it time to provide a platform to discuss our research in more detail. The blog is intended as a platform to allow us expand our research findings and methodological ideas, and to open a dialogue between researchers and readers, in order to engage readers with our research and explore it further. It will also provide a channel to test and refine our current thinking in a less formal, more inclusive medium than the traditional outlets of conferences and peer reviewed publications.
The EPPI-Centre benefits from having research staff from a variety of backgrounds: medicine, education, statistics, media, and economics to name but a few. This multidisciplinary expertise has benefitted our research work greatly, for example by providing insight and understanding of working practices and policy. Here on the EPPI-Centre blog our researchers will be able to share our work and expand their thoughts and ideas with interested readers. We want to provide you with thought-provoking and informative posts that will encourage debate, not just disseminate reports and journal articles. Most of all, we want to blog in order to challenge our approach to our work and explore the issues that we may encounter within our research. Thus, the blog offers us a way to explore new lines of thinking and engage with our audiences in new and productive ways.
What can you look forward to reading about on the EPPI-Centre blog?
- News and reviews – we will keep you informed of new review work, articles and books that we have published, as soon as they have been released.
- Exploratory essays - thoughts about ideas and lines of research that we find worth pursuing, thinking about and discussing with interested parties.
- Training and workshop session updates – we will let you know about training days or workshops that we are running and write about key points emerging from the sessions for those unable to attend.
- Conference news – we will let you know about upcoming conferences that we are running or speaking at, by providing you with key dates for your diaries and links to enable you to register.
- Projects and trials of interest - we will also post links to works by our own researchers and others to inform you of trials, projects, or other developments that you might find useful that relate to our work.
We think that you will find the blog a great way to interact with us here at the EPPI-Centre and hope that you join us regularly for updates of our published work as well as information about our plans for the remainder of 2016 and beyond.
Please also follow us on Twitter – we look forward to hearing from you in the coming months and engaging in online discussions!
Gillian Stokes is a Research Officer at the EPPI-Centre, UCL Institute of Education, University College London. Her main research interests include developing research methods and public and patient involvement in research, particularly children’s involvement in translational medicine. She has been working on systematic reviews focused on health and medicine since May 2013.
Sergio Graziosi is the Information Systems Manager at the EPPI-Centre, UCL Institute of Education, University College London. His main research interests revolve around the use of technology in systematic reviews as well as more generally the challenges and limitations of research synthesis.