A systematic review of the impact of summative assessment and tests on students' motivation for learning

A systematic review of the impact of summative assessment and tests on students' motivation for learning. Summary

Background

The current widespread use of summative assessment and tests is supported by a range of arguments. The points made suggest that tests indicate standards to be aimed for and enable these standards to be monitored, and also raise standards. Proponents claim that tests cause students, as well as teachers and schools, to put more effort into their work because of the rewards and penalties that can be applied on the basis of the test results. In opposition to these arguments is the claim that increase in scores is mainly the consequence of familiarisation with the tests and of teaching directed specifically towards answering the questions, rather than developing the skills and knowledge intended in the curriculum. It is argued that tests motivate only some students and increase the gap between higher and lower achieving students; moreover, tests motivate even the highest achieving students towards performance goals rather than learning goals, as required for continuing learning.

In the past decade,testing has burgeoned in many countries; this systematic review was prompted by concern to identify the impact of summative assessment on students' motivation for learning. Whilst the impact of testing on teachers, teaching and students' achievement has been well researched and represented in reviews of research, much less attention has been given to its impact on the affective and cognitive (mental activity) outcomes of education. The current widely embraced aim of developing in today's students the capacity to continue into lifelong learning means that, if some assessment practices are reducing motivation for learning, there is clearly a cause for concern. The purpose of the review was therefore to identify and synthesise research evidence about the impact of summative assessment on motivation for learning.

Definition of terms

Assessment is a term that covers any activity in which evidence of learning is collected in a planned and systematic way, and is used to make a judgement about learning. If the purpose is to help in decisions about how to advance learning and the judgement is about the next steps in learning and how to take them, then the assessment is formative in function. If the purpose is to summarise the learning that has taken place in order to grade, certificate or record progress, then the assessment is summative in function. When summative assessment is used for making decisions that affect the status or future of students, teachers or schools (that is, 'high stakes'), the demand for reliability of measures often means that tests are used in order to control closely the nature of the information and the conditions in which it is collected.

Motivation is a complex concept concerned with the drive, incentive or energy to do something. Motivation is not a single entity but embraces, for example, effort, self-efficacy, self-regulation, interest, locus of control, self-esteem, goal orientation and learning disposition. Learning, too, is a complex phenomenon that cannot be conceived as a single entity but is best understood as a field or as an ecological composite. The American Psychological Association's Learner Centered Psychological Principles include 14 factors that influence learning and learners. These include cognitive and metacognitive factors, motivational and affective factors, developmental and social factors, and individual difference factors. Thus for the purpose of this review, motivation for learning is understood to be a form of energy which is experienced by learners and which drives their capacity to learn, adapt and change in response to internal and external stimuli. It is closely identified with the 'will to learn', which determines the effort that a learner will put into a task.

There are different ways in which the energy or the will to learn can be motivated and it is particularly important to distinguish between intrinsic and extrinsic motivation. Those who learn in order to gain an extrinsic reward are unlikely to continue learning once the reward is obtained or the penalty avoided, and they will give up earlier if reward seems unobtainable. For continued learning, the motive needs to be intrinsic, the reward being in the process of learning and in the recognition of being in control of, and responsible, for one's own learning.

Aims of the review and review questions

Aims

The aims of the review were as follows:

To conduct a systematic review of research evidence of the impact of summative assessment and testing on students' motivation for learning
To determine the conditions and processes (including teaching) associated with summative assessment and testing which have a positive or a negative impact on students' motivation for learning
To identify actions that could be taken to increase the positive and decrease the negative impact of assessment on students' motivation for learning
To consider evidence relating motivation for learning to learning achievements and learning strategies
To make recommendations for policy and practice based on these findings
To identify questions that need to be addressed by research so that decisions on policy and practice in summative assessment can be evidence-based.

Review questions

Thus the review was designed to identify and synthesise research relevant to the question:

What is the evidence of the impact of summative assessment and testing on students' motivation for learning?

In order to achieve all the aims of the review, it was necessary to address the further questions:

How does any impact vary with the characteristics of the students and the conditions of the assessment or testing?
In those studies where impact on students has been reported, what is the evidence of impact on teachers and teaching?
What actions in what circumstances would increase the positive and decrease the negative impact on students of summative testing and assessment programmes? In particular, what is the evidence that any impact is increased by 'raising' the stakes?
What are the implications of these findings for assessment policy and practice?

Methods

The review was conducted using the procedures for systematic review of research in education being developed by the EPPI Centre. A wide-ranging search was carried out for studies, written in English, of assessment for summative purposes in schools for students between the ages of 4 and 19, and which reported on aspects of students' motivation for learning. The search for studies involved searching relevant electronic databases and journals online, following up citations in other reviews, handsearching journals held in the library and using personal contacts. Inclusion and exclusion criteria were applied to abstracts before full texts were read and categorised, using a core set of keywords and additional keywords specific to the review. This process resulted in some further studies being excluded. The remaining studies were analysed in depth using the Guidelines for Extracting Data and Assessing Quality of Primary Studies in Educational Research, Version 0.94 (EPPI Centre, 2001). Judgements were made as to the weight of evidence relevant to the review provided by each study.

Lengthy consideration was given to ways in which the findings of different studies could be brought together to form conclusions. None of the studies dealt with all the variables included in the concept of motivation for learning but they could be grouped according to the particular outcomes that were investigated. These outcomes fell into three distinct groups, central to motivation for learning. Expressed from a learner's perspective, these are as follows:

What I feel and think about myself as a learner
The energy I have for the task
How I perceive my capacity to undertake the task.

The findings relating to the main review question are reported under these headings. Judgements were made about each study in relation to methodological soundness, appropriateness of the study type and relevance to the focus of the review. In the synthesis, greater weight was accorded to those studies rated most highly on these counts.

Results

The initial search resulted in the identification of 183 potentially relevant studies. The systematic review process involved excluding some studies at various stages, for reasons that were documented. This resulted in 19 studies being identified as directly addressing the review question; these included 13 outcome evaluations (three randomised controlled trials, three case control designs, three post-test and four of other designs), three descriptive studies and three process evaluations.

Evidence of impact

Between them, the identified studies considered a number of the component aspects of motivation, but none considered all. The following main findings emerged from studies providing high-weight evidence:

After the introduction of the National Curriculum tests in England, low-achieving pupils had lower self-esteem than higher-achieving pupils, whilst beforehand there was no correlation between self-esteem and achievement.
When passing tests is high stakes, teachers adopt a teaching style which emphasises transmission teaching of knowledge, thereby favouring those students who prefer to learn in this way and disadvantaging and lowering the self-esteem of those who prefer more active and creative learning experiences.
Repeated practice tests reinforce the low self-image of the lower-achieving students.
Tests can influence teachers' classroom assessment, which may be interpreted by students as purely summative, regardless of the teacher's intentions, possibly as a result of teachers' over-concern with performance rather than process.
Students are aware of a performance ethos in the classroom and that the tests give only a narrow view of what they can do.
Students dislike high-stakes tests, show high levels of test anxiety (particularly girls) and prefer other forms of assessment.
Teachers have a key role in supporting students to put effort into their learning activities.
Feedback on assessments has an important role in determining further learning. Students are influenced by feedback from earlier performance on similar tasks in relation to the effort they invest in further tasks.
Teacher feedback that is ego-involving rather than task-involving can influence the effort students put into further learning and their orientation towards performance rather than learning goals.
High-stakes assessment can create a classroom climate in which transmission teaching and highly structured activities predominate and which favour only those students with certain learning dispositions.
High-stakes tests can become the rationale for all that is done in classrooms, permeating teacher-initiated assessment interactions.
Goal orientations are linked to effort and self-efficacy.
Teacher collegiality is important in creating an assessment ethos that supports students' feelings of self-efficacy and effort.
An education system that puts great emphasis on evaluation produces students with strong extrinsic orientation towards grades and social status.

Evidence from studies providing medium-weight evidence

The state-mandated tests in the US lower self-esteem for 'at risk' students.
Low-achieving children can improve their achievement with the help of supportive teachers or other adults.
Interest and effort are encouraged in classrooms which encourage self-regulated learning by providing students with an element of choice, control over challenge and opportunities to work collaboratively.
Teachers can influence the criteria that students use in self-assessment of their work.

Differences relating to age, gender and level of achievement

Older students (that is, aged 11 and above) are more likely than younger ones to have a better understanding of simple grades; they are less likely to report teachers' grades as being fair but attached more importance to them. Older students are more likely to attribute relative success to effort and ability, whilst younger ones attribute it to external factors or practice. Older students are also more likely to focus on performance outcomes (that is, scores and levels) rather than learning processes.

Lower-achieving older students are more likely than younger ones to minimise effort and respond to tests randomly or by guessing. There was no evidence of age differences in test-taking strategies (checking, monitoring time, etc). Instead of motivation and test familiarity increasing with age, older students feel more resentment, anxiety, cynicism and mistrust of standardised achievement tests.

Lower-achieving students are doubly disadvantaged by summative assessment. Being labelled as failures has an impact not just on current feelings about their ability to learn, but lowers further their already low self-esteem and reduces the chance of future effort and success. Only when low achievers have a high level of support (from school or home), which shows them how to improve, do some escape from this vicious circle.

Results of tests which are 'high stakes' for individual students, such as the 11+ in Northern Ireland, have been found to have a particularly strong and devastating impact on those who receive low grades. However, tests which are high stakes more for schools than for students (such as the National Curriculum tests in England and state-mandated tests in the US) hardly have less impact. Students are aware of repeated practice tests and the narrowing of the curriculum and only those confident of success enjoy the tests. In taking tests, high achievers are more persistent, use appropriate test taking strategies and have more positive self-perceptions than low achievers. Low achievers become overwhelmed by assessments and demotivated by constant evidence of their low achievement, thus further increasing the gap between low- and high-achieving students. A greater emphasis on summative assessment thus brings about increased differentiation.

Girls were reported as expressing more test anxiety than boys. Girls also make more internal attributions of success or failure than boys, with consequences for their self-esteem.

The effect of the conditions of testing

The conditions that affect the impact of summative assessment relate to the degree of self-efficacy of students, the extent to which their effort is intrinsically or extrinsically motivated, the encouragement of self-regulation and self-evaluation, and the pressure imposed by adults outside the school. Feedback has a central role since self-efficacy is judged from performance in previous tasks of the same kind. If students have experienced success in earlier performance, they are more likely to feel able to succeed in a new task. Feedback that focuses on the task is associated with greater interest and effort, whereas feedback that is ego-involving rather than task-involving is associated with an orientation to performance goals.

Teachers' own class-testing practices can help to increase self-efficacy, if teachers explain the purpose and expectations of their tests and provide feedback. Further, a school's 'assessment culture' influences students' feelings of self-efficacy and effort. Collegiality - meaning constructive discussion of testing and the development of desirable assessment practice in the school - has a positive effect, whilst a focus on performance outcomes has a negative effect.

The degree to which learners are able to regulate their own learning also appears to favour students' interest and to promote focus on the intrinsic features of their work. Students who have some control over their work by being given choice and encouragement to evaluate it value the significant content features of their work, rather than whether it was correct or not. Thus classrooms that allow more self-regulation promote change in the criteria students use in self-evaluation.

When test scores are a source of pride to parents and the community, pressure is brought to bear on the school for high scores. Similarly, parents bring pressure on their children when the result has consequences for attendance at high social status schools. For many students, this increases their anxiety, even though they recognise their parents as being supportive.

The effect on teachers and teaching

High-weight evidence from studies reporting on the effect of tests on teachers and teaching in addition to impact on students' motivation indicates that when passing tests is high stakes, teachers adopt a teaching style which emphasises transmission teaching of knowledge, thereby favouring those students who prefer to learn in this way and disadvantaging and lowering the self-esteem of those who prefer more active and creative learning experiences. External tests have a constricting effect on the curriculum, resulting in an emphasis on subjects tested at the expense of creativity and personal and social development. High-stakes tests often result in a great deal of time being spent on practice tests, the valuing of test performance and undervaluing of other student achievements, with teachers' own assessment becoming summative in function rather than formative.

Increasing the positive and decreasing the negative impact

Although the study findings pointed to negative impacts of summative assessment on aspects of motivation for learning, they also indicated ways in which these could be ameliorated so that learners as well as teachers can benefit from summative assessment. High-weight evidence suggests that practice in summative assessment could be improved by the following:

Promoting learning goal orientation rather than performance orientation
Cultivating intrinsic interest in the subject and putting less emphasis on grades
Teaching approaches that encourage self-regulated learning (including collaboration among students) and cater for a range of learning styles
Providing explanations of the purpose of assessment and providing feedback that can help further learning
Establishing a school climate of constructive discourse about assessment among teachers, and between teachers and students
Developing a constructive and supportive school ethos in relation to tests
Ensuring that the demands of the tests are consistent with the expectations of teachers and the capabilities of the students
Involving students in decisions about testing
Developing students' self-assessment skills and use of learning rather than performance criteria as part of a classroom environment that promotes self-regulated learning
Using assessment to convey a sense of learning progress to students.

Implications for assessment practice and policy

In order to explore the implications of the review as fully as possible, the review methodology included a consultation conference with invited policy-makers and practitioners. Some of the messages below are derived directly from the research studies whilst others emerged from discussion of the review findings and reference to current practice in the UK at the consultation conference.

Practice

Reduce the narrowing impact on the curriculum and on teaching methods by professional development that emphasises learning goals and learner-centred teaching approaches.
Share and emphasise learning goals rather than performance goals with students and provide feedback to students in relation to these goals.
Share in developing and implementing a school-wide policy that includes assessment both for learning (formative) and of learning (summative), and ensure that the purpose of all assessment is clear to all involved, including parents and students.
Develop students' understanding of the goals of their learning, the criteria by which they are assessed and their ability to assess their own work.
Implement strategies for encouraging self-regulation in learning and positive inter-personal relationships.
Avoid comparisons between students based on test results.
Present assessment realistically, as a process which is inherently imprecise and reflexive, with results that have to be regarded as tentative and indicative rather than definitive.

Policy

Recognise that current high-stakes testing is providing information about students' attainment by reducing motivation that is of questionable validity.
Recognise the importance of the various components of motivation for students' attainments in education. Empirical evidence shows that these are positively related to attainment. For example, the OECD/PISA (2001) provides firm evidence that achievement of literacy is positively related to students' interest in their learning, the extent to which their learning strategies help them to develop understanding through linking to existing knowledge instead of just memorising, and the extent to which they feel in control of their learning.
Provide professional development, particularly for senior school management, aimed at enabling schools to develop a range of assessment strategies and to use summative information of different kinds for improving the learning of their students. Current training focuses too narrowly on the use of test scores, accountability and target setting; it needs to be more learner-focused.
For summative purposes in reporting on individual students, move towards testing students when their teachers judge them to be ready to show their achievement at a certain level, thus minimising experience of failure and its impact on self-esteem.
Ensure that the criteria used in school evaluation (including self-evaluation) make explicit reference to a full range of subjects; include moral, spiritual and cultural as well as cognitive aims; and range across an appropriate variety of teaching methods and learning outcomes.
Develop schools' self-evaluation practices, including teachers' assessment skills, through targeted professional development.
For tracking national standards, sample students rather than test all and use a wider range of test forms and items.
Quantify the 'cost' of current practice, including teaching time taken up with testing and practice testing and the additional workloads to teachers of extra marking, in addition to the cost of the tests and their development.
Use test development expertise to create new tests and assessment that will enable all valued outcomes of education, including creativity and learning to learn, to be assessed.
Reduce the 'stakes' of summative assessment by avoiding comparisons among schools in terms of test results and end the practice of basing targets only on test results.

Other outcomes of the review were the identification of further research required in this area, particularly to extend the research base in relation to outcomes of education that are particularly important for lifelong learning, and a clarification and development of the methodology of systematic reviewing applied to educational research.

References

EPPI Centre (2001) Guidelines for extracting data and quality assessing primary studies in educational research (version 0.94). London: EPPI Centre, Social Science Research Unit

OECD (2001) Knowledge and skills for life: first results from PISA 2000. Paris: OECD

This report should be cited as: Harlen W, Deakin Crick R (2002) A systematic review of the impact of summative assessment and tests on students' motivation for learning. In: Research Evidence in Education Library. London: EPPI Centre, Social Science Research Unit, Institute of Education, University of London.