EPPI-Centre > Publications > Systematic reviews > Assessment by teachers - impact > Assessment by teachers - impact

A systematic review of the evidence of the impact on students, teachers and the curriculum of the process of using assessment by teachers for summative purposes. Summary

Background

This fourth review by the Assessment and Learning Research Synthesis Group (ALRSG) was designed to complement the previous review, which concerned the reliability and validity of teachers' assessment for summative purposes. These reviews take place at a time of continued interest, at the highest levels of policy in the United Kingdom (UK), in giving a greater role to summative assessment by teachers. Recent evidence of this interest is found in the Primary National Strategy from the Department for Education and Skills (DfES), the commissioning by the Qualifications and Curriculum Authority (QCA) in England of a review of Experiences of Summative Teacher Assessment in the UK, the acceptance by the Welsh Assembly of the report of the Daugherty Assessment Review Group in Wales, the Assessment is for Learning project in Scotland, and the draft proposals of the Tomlinson Working Group on 14-19 education reform.

Assessment by teachers has the potential for providing summative information about students' achievement since teachers can build up a picture of students' attainments across the full range of activities and goals. Other benefits claimed include: less pressure on students and teachers compared with external tests and examinations; greater freedom of teachers to pursue learning goals in ways best suited to their students; the potential for information about students' ongoing achievements to be used formatively, to help learning, as well as for summative purposes; and the avoidance of the negative impact of tests on students' motivation for learning revealed by the first ALRSG review. However, there are concerns about teachers' summative assessment relating to possible interference with the relationship between teacher and students, teachers' workload, and the need to ensure the quality and reliability of the outcomes. Experience in systems where teachers' assessment is successfully used for summative assessment shows that these problems can be overcome and confirms that there are benefits for students and teachers. This review was carried out to bring research evidence to bear on these different claims and experiences.

Definition of terms

Assessment in the context of education involves deciding, collecting and making judgements about evidence relevant to the goals of the learning. How these processes are carried out depends on the purposes of the assessment. The term 'summative assessment' refers to an assessment with a particular purpose - that of providing a record of a student's overall achievement in a specific area of learning at a certain time. It is the purpose that distinguishes it from assessment described as formative, diagnostic or evaluative, rather than any particular method of gathering information about students' performance.

Although teachers inevitably have a role in any assessment, the term 'assessment by teachers' (teachers' assessment, often abbreviated to TA) is used for assessment where the professional judgement of teachers has a significant role in drawing inferences and making judgements, as well as in gathering evidence for assessment. Thus the definition of assessment by teachers for summative purposes used in this review, as in the previous review, is that it refers to any activity in which teachers gather evidence in a planned and systematic way about their students' learning in order to draw inferences, based on their professional judgement, to report achievement at a particular time.

This excludes assessment where teachers gather information that is marked by others, or teachers' involvement in setting or marking external examinations or tests.

Review questions

Considerations of the policy and practice background to this review led to the identification of the main review question as:

What is the impact on students, teachers and the curriculum of the process of using assessment by teachers for summative purposes?

To achieve its aims the review addressed the subsidiary question:

What conditions and contexts affect the nature and extent of the impact of using teachers' assessment for summative purposes?

The findings are used to address the further question:

What are the implications of the findings for policy and practice in summative assessment?

The outcomes of the review are as follows:

The production of a map of studies reporting on the impact of using teachers' assessment for summative purposes on students, teachers and the curriculum
The identification of the implications of the findings for different user groups, principally practitioners and policy-makers
Publication of the full report and of short summaries for different user groups in the Research Evidence and Education Library (REEL)
Identification of further research that is needed in this area.

Methods

The review methodology followed the procedures devised by the EPPI Centre, with the technical support of the EPPI Centre. Criteria were defined for guiding a wide-ranging search for studies that dealt with some form of summative assessment conducted by teachers, involving students in school in the age range 4 to 18 and reporting on the impact of the teachers' assessment on students, teachers or the curriculum. Bibliographic databases and registers of educational research were searched online as were relevant online journals, with other journals and back numbers of some of those available online being searched by hand. Other studies were found by scanning the reference lists of already-identified reports, making requests to members of relevant associations and other review groups, and using personal contacts.

All studies identified in these ways were screened, using inclusion and exclusion criteria, and the included studies were then keyworded, using the EPPI Centre Core Keywording Strategy (EPPI Centre, 2002a) and additional keywords specific to the context of the review. Keywords were used to produce a map of selected studies. Detailed data extraction was carried out online. using EPPI Centre generic and review-specific data extraction guidelines (EPPI Centre, 2002b). Two reviewers worked independently before comparing entries and reaching a consensus. Judgements were made as to the weight of evidence relevant to the review provided by each study in relation to methodological soundness, appropriateness of the study type and relevance of the focus to the review questions.

The structure of the synthesis was based on the impact reported by the studies. Potentially there were three main headings, bringing together the findings for impact on students, teachers and the curriculum. However, since few studies were found that dealt with the curriculum - and those that did were also concerned with impact on teachers - these were combined into one group of impact on teachers and the curriculum. Within these two broad groups, formed according to the nature of the impact reported, there are important subdivisions according to whether the assessment is used for internal school purposes only (such as grades and routine school tests and examinations) or for use by others outside the school (as in the case of certification, selection, transfer or the accountability of the school). Although it was often the case that an assessment was used for both internal and external purposes, it was considered helpful to try to identify the main use and to discuss studies under these subheadings.

Potential users of the review, represented in the Review Group, were involved in several ways: providing advice during and between Group meetings; providing information about studies; participating in keywording and data extraction; and commenting on draft findings and implications to be drawn from them.

The ALRSG includes the following users: a secondary school deputy head teacher with responsibility for assessment, a local authority primary adviser and a project director of the National College of School Leadership. Two members of the group are members of AAIA, another led the review of assessment in Wales and another is Director of the Learning to Learn project of the ESRC's Teaching and Learning Research programme. Eight of the Review Group are members of the Assessment Reform Group and, through this, the Review Group has an ongoing relationship with the DfES and with the QCA.

Results

Identification of studies

The search for studies resulted in 343 papers being found, of which 301 were excluded in either a one-stage or a two-stage screening process, using inclusion and exclusion criteria. Full texts were obtained for 26 of the remaining 42 papers, from which a further two were excluded during keywording. One paper was found to be linked to another and one of these papers was then excluded as a separate item, as it was based on the same set of data. This left 23 studies after keywording. All these were included in the systematic map and in-depth review.

Systematic map

The 23 studies included in the in-depth review were mapped in terms of the EPPI Centre and review-specific keywords. All were written in the English language; 12 were conducted in England, nine in the United States and one each in New Zealand and Hong Kong.

All studies were concerned with students between the ages of 4 and 18. Eleven involved primary school students (aged 10 or below) only, six involved secondary students (aged 11 or above) only, and five were concerned with both primary and secondary students. A slightly larger proportion of studies conducted in primary schools reported impact on teachers compared with those conducted in secondary schools. About 70% of studies in secondary schools and about 80% in primary schools were concerned with assessment of English, while 43% and 60% respectively were concerned with assessment of mathematics.

Twenty studies were classified as involving assessment of work as part of, or embedded in, regular activities. Three were classified as portfolios, two were classified as projects and eight were either set externally or set by the teacher to external criteria. The most common use of the assessment in the studies was for internal school purposes, with four studies related to assessment for certification and another three to external purposes that had high stakes for the school.

In-depth review and synthesis

Seven of the 23 included studies provided evidence of high weight for the review. Six of these provided information about impact on students; three also provided information about impact on teachers. Of the 12 studies providing evidence of medium weight, all except one provided evidence of impact on teachers, whilst five provided information of impact on students.

Findings from studies relating to impact of teachers' assessment on students

When teachers' assessment is used for external purposes, there was high-weight evidence of the following:

Older students respond positively to summative assessment of their coursework by teachers, finding the work motivating and being able to learn during the assessment process (Bullock et al., 2002).
Students need more help, in the form of better descriptions and examples, to understand the assessment criteria and what is expected of them in meeting these criteria (Bullock et al., 2002; Iredale, 1990; Stables, 1992).
The impact of summative teacher assessment on students depends on the high stakes use of the results (Yung, 2002).
The impact of summative teachers' assessment on students will be affected by the way teachers interpret their roles as assessors and by their orientation towards improving the quality of students' learning or maximising their marks (Bullock et al., 2002; Yung, 2002).

There is medium-weight evidence of the following:

Teachers consider that young students may not do their best work when constrained by an external task (Abbott et al., 1994).

When teachers' assessment is used for internal purposes, there is high-weight evidence in relation to impact on students as follows:

Feedback from earlier assessment impacts on the effort that students apply in further tasks of the same kind; effort is motivated by non-judgemental feedback that gives information about how to improve (Brookhart and DeVoge, 1999; Carter, 1997/8).
The way in which teachers present classroom assessment activities may affect students' orientation to learning goals or performance goals (Brookhart and DeVoge, 1999).
Changing teachers' assessment practices to include processes and explanations can lead to better student learning (Flexer et al., 1995).
Using grades as rewards and punishments is harmful to students' learning by encouraging extrinsic motivation (Iredale, 1990; Pilcher, 1994).

There is medium-weight evidence of the following:

Teachers' own unguided grades are influenced by non-achievement factors, such as students' behaviour, effort and attendance; this disadvantages some students (Bennett et al., 1993; Cizek et al., 1995/6).
The introduction of teachers' assessment related to levels of the National Curriculum in England and Wales was perceived by teachers as having a positive impact on students' learning experiences (Hall et al., 1997).

Findings from studies relating to impact of teachers' assessment on teachers and the curriculum

When teachers' assessment is used for external purposes there is high-weight evidence of the following:

Teachers vary in how they respond to being given the role of assessor and the approach they take to interpreting external assessment criteria; strict adherence to the regulations leads them to be less concerned with students as individuals (Morgan, 1996; Yung, 2002).

There is medium-weight evidence of the following:

The impact on teaching of external assessment requirements depends on the value that teachers find in the information they gain about their students through the assessment (Abbott et al., 1994; Bennett et al., 1992; Koretz et al., 1994).
Assessment for external purposes adversely affects teachers when it is seen as taking up too much time from teaching (Abbott et al., 1994; Bennett et al., 1992).

When teachers' assessment is used for internal purposes, there is high-weight evidence in relation to impact on teachers and the curriculum as follows:

The introduction of assessment techniques that require students to think more deeply leads to changes in teaching that extend the range of students' learning experiences (Flexer et al., 1995).
Close external control of teacher assessment inhibits teachers from gaining detailed knowledge of their students (Johnston et al., 1993).

There is medium-weight evidence of the following:

When teachers' assessment is built into their planning, the process has a positive impact on teaching and learning. This impact is further enhanced by professional collaboration at the school level (Hall et al., 1997; Hall and Harding, 2002).
Assessment by teachers indicates where learning opportunities for their students need to be extended (Valencia and Au, 1997; Whetton et al., 1991).
In a low stakes context, the process of summative assessment by teachers helps them to clarify the meaning of learning outcomes (Valencia and Au, 1997).
The value of teachers' summative assessment of potential users depends on teachers internalising the nature of progression in relation to the learning goals (Cizek et al., 1995/6; Hill, 2002).

Findings in relation to the conditions and contexts affecting the nature and extent of the impact of using teachers' assessment for summative purposes

There is both high- and medium-weight evidence of the following:

New assessment practices are likely to have a positive impact on teaching if teachers find them of value in helping them to learn more about their students and to develop their understanding of curriculum goals; time to experience and develop some ownership of practices enhances their positive impact (Abbott et al., 1994; Bennett et al., 1992; Flexer et al., 1995; Gipps and Clarke, 1998; Koretz et al., 1994).
When high-stakes judgements are associated with teachers' assessment, one effect is for teachers to reduce assessment tasks to routine events and restrict students' opportunities for learning from them; high stakes encourage some teachers to give high grades where there is doubt, which may not be in the students' interests (Bullock et al., 2002; Hall and Harding, 2002; Morgan, 1996; Yung, 2002).
Shared criteria for assessing specific aspects of achievement lead to positive impact on students and on teaching; in the absence of such guidance, there is little positive impact on teaching and a potential negative impact on students (Bennett et al., 1993; Cizek et al., 1995/6; Hall et al., 1997; McCallum et al., 1993; Pilcher, 1994).
The process that teachers use in setting assessment tasks and in grading impacts on students' motivation for learning, particularly their goal orientation, when grades are used as rewards or punishments; the negative impact can be alleviated by ensuring that students have a firm understanding of assessment processes and criteria (Brookhart and DeVoge, 1999; Bullock et al., 2002; Iredale, 1990; Stables, 1992).
Summative assessment by teachers has a more positive impact on teachers and teaching when integrated into practice than when concentrated at a certain occasion (Bennett et al., 1993; Bullock et al., 2002; Carter, 1997/8; Hall et al., 1997; Iredale, 1990; Johnston et al., 1993; Koretz et al., 1994; McCallum et al., 1993; Whetton et al., 1991).
Opportunities that enable teachers to share and develop their understanding of assessment procedures enable them to review their teaching practice and their view of students' learning and of subject goals; such opportunities need to be sustained over time and should preferably include provision for teachers to work collaboratively across as well as within schools (Flexer et al., 1995; Gipps and Clarke, 1998; Hall et al., 1997; Hall and Harding, 2002; Hiebert and Davinroy, 1993; Valencia and Au, 1997).

Conclusions

Strengths and limitations of the review

The strengths of the review emanate from its systematic and collaborative procedures. The documentation of searches and of inclusion and exclusion decisions enables the work to be extended at a later date without duplication. All critical decisions about inclusion, exclusion and weight of evidence were taken by at least two people working first independently and then reconciling any differences in judgements. The main limitations in relation to procedures arise from the search being confined to studies published in English and available either online, in the university library or via inter-library loan. The findings are limited to some extent by the small number of studies found that provided evidence of high weight in relation to the review questions.

Implications for policy

Summative assessment by teachers has the potential for positive effects on students and on teachers, without the negative effects associated with external tests and examinations.
Using teachers' assessment for summative purposes can support valid assessment of key learning processes as well as assessment of learning outcomes related to higher-level cognitive skills.
Summative assessment by teachers has most benefit when teachers use evidence gathered over a period of time and with appropriate flexibility in choice of tasks rather than from an event taking place at a particular time. This enables information to be used formatively to adapt teaching as well as summatively.
Using the results of student assessment for high-stakes school accountability reduces the validity of the assessment, whether this is conducted by teachers or by external tests and examinations.
Introducing new assessment practices can support beneficial change in teaching, providing that the techniques are well matched to learning goals and illustrate how students can be required to use important conceptual knowledge and learning skills.
Regulations for teachers' summative assessment should allow teachers opportunities to assimilate summative assessment into their practice and to design appropriate classroom programmes. When changes are made in assessment practices, time must be allowed for this assimilation to happen.

Implications for practice

The following actions are likely to increase the benefit of teachers undertaking summative assessment of their own students:

At all stages and for all purposes, students should be helped to understand the criteria by which their work is assessed. This is likely to mean providing and discussing examples that illustrate the practical meaning of the criteria.
Teachers should make explicit to all concerned - colleagues, parents and students - the basis of the marks and grades they assign for internal school purposes. Achievement grades should not be influenced by non-academic factors, such as behaviour and participation, which should be reported separately as appropriate.
When presenting assessment tasks to students, teachers should emphasise learning outcomes and not the attainment of a high grade, thus avoiding the encouragement of extrinsic motivation which leads to shallow learning.
Teachers should internalise the progression in skills and understanding they aim to help students develop and interpret student performance in these terms rather than use a checklist of specific unconnected behaviours. In this way summative assessment helps teachers' understanding of learning goals and facilitates more detailed knowledge of their students.
Schools should set aside time for teachers to discuss assessment issues, plan assessments and moderate their judgements of students' work. This not only improves the reliability of the assessment but enables teachers to use the process of summative assessment to help teaching and learning.

Implications for research

The low number of studies found that met the inclusion criteria for this study, with only seven providing evidence of high weight, leads to an obvious implication that more high-quality research is needed in this area. Given the interest at high levels in government in making greater use of teachers' assessment in summative assessment, indicated in the Background section at the beginning of this chapter, there is some urgency in meeting the need for more research.

Particular research foci suggested by this review are as follows:

How teachers manage the dual roles as teacher and assessor
The impact on students and on other uses of assessment of changing from tasks devised and marked externally to using teachers' judgements of students' performance in special tasks and in regular work
The identification of factors that support teachers use of summative assessment to improve students' learning experience: that is, how the formative use of assessment can be integrated with the summative use
Direct comparison of different approaches used by teachers in summative assessment to investigate whether they make any difference to outcomes or to impact on students
Investigation of what information is actually used by teachers in their assessment and what impact this has on the curriculum experience by students
The role of student self-assessment in summative assessment
The impact on students of developing their awareness of success criteria and providing exemplification of learning goals
What changes to accountability procedures would preserve the integrity of teachers' assessment and minimise pressures to give inflated grades or levels.

References

Abbott D, Broadfoot P, Croll P, Osborn M, Pollard A (1994) Some sink, some float: National Curriculum assessment and accountability. British Educational Research Journal 20: 155–174.

Bennett SN, Wragg EC, Carre CG, Carter DSG (1992) A longitudinal study of primary teachers’ perceived competence in, and concerns about, National Curriculum implementation. Research Papers in Education 7: 53–78.

Bennett RE, Gottesman RL, Rock DA, Cerullo F (1993) Influence of behaviour, perceptions and gender on teachers’ judgements of students’ academic skill. Journal of Educational Psychology 85: 347–356.

Brookhart SM, DeVoge JG (1999) Testing a theory about the role of classroom assessment in student motivation and achievement. Applied Measurement in Education 12: 409–425.

Bullock K, Bishop KN, Martin S, Reid A (2002) Learning from coursework in English and geography. Cambridge Journal of Education 32: 325–340.

Carter CR (1997/8) Assessment: shifting the responsibility. Journal of Secondary Gifted Education 9: 68–75.

Cizek GJ, Fitzgerald SM, Rachor RE (1995/6) Teachers’ assessment practices: preparation, isolation and the kitchen sink. Educational Assessment 3: 159–179.

Evidence for Policy and Practice Information and Co-ordinating Centre (EPPICentre) (2002a) Core Keywording Strategy: Data collection for a register of educational research. Version 0.9.7. London: EPPI Centre, Social Science Research Unit.

EPPI Centre (2002b) Review Guidelines for Extracting Data and Quality Assessing Primary Studies in Educational Research. Version 0.9.7. London: EPPI Centre, Social Science Research Unit.

Flexer RJ, Cumbo K, Borko H, Mayfield V, Marion SF (1995) How ‘messing about’ with performance assessment in mathematics affects what happens in classrooms (Technical Report 396). Los Angeles, Centre for Research on Evaluation, Standards and Student Testing (CRESST). Available from: http://cresst96.cse.ucla.edu/Reports/TECH396.PDF

Gipps C, Clarke S (1998) Monitoring consistency in teacher assessment and the impact of SCAA’s guidance materials at Key Stages 1, 2, and 3: Final report. London: QCA.

Hall K, Webber B, Varley S, Young V, Dorman P (1997) A study of teacher assessment at Key Stage 1. Cambridge Journal of Education, 27: 107–122.

Hall K, Harding A (2002) Level descriptions and teacher assessment in England: towards a community of assessment practice. Educational Research 44: 1–15.

Hiebert E, Davinroy K (1993) Dilemmas and issues in implementing classroom-based assessment for literacy (Technical Report 365). Los Angeles, Centre for Research on Evaluation, Standards and Student Testing (CRESST). Available from: http://www.cse.ucla.edu/CRESST/Reports/TECH365.PDF

Hill M (2002) Focussing the teacher’s gaze: primary teachers reconstructing assessment in self managing schools. Educational Research for Policy and Practice 1: 113–125.

Iredale C (1990) Pupils’ attitudes towards GASP (Graded Assessments in Science Project). School Science Review 72: 133–137.

Johnston PH, Afflerbach P, Weiss PB (1993) Teachers’ assessment of the teaching and learning of literacy. Educational Assessment 1: 91–117.

Koretz D, Stecher BM, Klein S, McCaffery D (1994) The Vermont Portfolio Assessment Program: findings and implications. Educational Measurement: Issues and Practice 13: 5-16.

McCallum B, McAlister S, Brown M, Gipps K (1993). Teacher assessment at Key Stage One. Research Papers in Education 8: 305–328.

Morgan C (1996) The teacher as examiner: the case of mathematics coursework. Assessment in Education 3: 353–375.

Pilcher JK (1994) The value-driven meaning of grades. Educational Assessment 2: 69–88.

Stables A (1992) Speaking and listening at Key Stage 3: some problems of teacher assessment. Educational Research 34: 107–115

Valencia SW, Au KH (1997) Portfolios across educational contexts: issues for evaluation, teacher development and system validity. Educational Assessment 4: 1–35.

Whetton C, Sainsbury M, Hopkins S, Ashby J, Christophers U, Clarke J, Heath M, Jones G, Punchers J, Schagen I, Wilson J (1991) A Report on Teacher Assessment. London, SEAC.

Yung B (2002) Same assessment, different practice; professional consciousness as a determinant of teachers; practice in a school-based assessment scheme. Assessment in Education 9: 97–117.

This report should be cited as: Harlen W (2004) A systematic review of the evidence of the impact on students, teachers and the curriculum of the process of using assessment by teachers for summative purposes. In Research Evidence in Education Library. London: EPPI Centre, Social Science Research Unit, Institute of Education, University of London.