A systematic review of the evidence of reliability and validity of assessment by teachers used for summative purposes. User perspective: a teacher

As a practising teacher, I was invited to participate with a team of leading educational researchers who were reviewing research on teacher assessment as a dependable route to measuring student performance. This summary details what we found.

Why was this review undertaken?

The recent 'black box' review of classroom assessment indicated that assessment used for formative purposes benefits teaching and learning, and raises standards of student performance. However, a subsequent review of the impact of examinations and tests on students' motivation for learning showed that external high-stakes tests can have a negative impact on both students' desire to learn and also on the curriculum and pedagogy. However, summative assessment is necessary and serves the important purpose of providing information by summarising students' achievement and progress for their teachers, parents, the students themselves, and others who need this information. To serve this purpose effectively, summative assessment should interfere as little as possible with teaching methods and the curriculum. Moreover, the means of assessment should reflect the full range of learning outcomes, particularly those which demonstrate the ability to transfer understanding into new authentic settings, and which promote each student's capacity to become more independent in their learning - to learn how to learn.

Rather than considering examinations and testing as the only suitable route to summative assessment, this review looked more closely at assessment by teachers. There is no doubt that this has the potential to provide summative information about students' achievement, since teachers can build up a picture of students' attainments across a full range of activities and goals. In a number of countries, assessment by teachers is used as the main source of information in national and state assessment systems; but in other countries it is perceived as unreliable and subject to bias. This review was undertaken to provide some research evidence about the dependability of summative assessment by teachers and the conditions which affect it.

What questions did this review seek to answer?

  • What is the research evidence of the reliability and validity of assessment by teachers for the purpose of summative assessment?
  • What conditions affect the reliability and validity of teachers' summative assessment?

How was the review undertaken?

The review methodology followed the procedures devised by the EPPI-Centre with a wide-ranging search for published research studies that dealt with some form of summative assessment conducted by teachers, involving students in school in the age range 4 to 18. The research resulted in a total of 431 studies being found. Of these, 30 were included in the in-depth review, as they contained specific relevant findings.

Table 1: Validity, reliability and classroom impact - external tests, examinations and teacher assessment

  Validity - does the approach give a fair assessment of what it claims to measure? Reliability - are the outcomes of the assessment reproducible? Classroom impact - what impact does this assessment have on the classroom?
External tests / exams External tests and examinations are perceived as having high levels of validity. However the skills and knowledge being tested do not always appear to be transferable, and the tests can be viewed as artificial rather than authentic. The claim of high validity is not well-supported by evidence. External tests and examinations are perceived as having high levels of reliability. Despite the use of rigorous mark-schemes, moderation and scrutiny procedures; the claim of high reliability is not well-supported by evidence. External tests and examinations are known to have negative impact on students' motivation for learning, negative impact on curriculum content ('what is taught is what is tested'), and negative impact on teaching approaches (excessive test practice, and 'chalk and talk' approaches predominate)
Teacher assessment assess3_image[1].jpg assess3_image[1].jpg Teacher assessment, used for formative purposes, benefits teaching (through a greater emphasis on responding to students' known needs), benefits learning (by encouraging activities that promote understanding), and raises standards of student performance.

The review sought to identify evidence that informs us about the two 'empty boxes' in the table i.e. the validity and reliability of teacher assessment for summative purposes: how dependable is teachers' assessment for this purpose?

What is the research evidence and what are the implications?

The reliability of teacher assessment depends on either the provision of closely specified tasks, or the shared development of well understood, carefully formulated criteria.

  • Teachers who have participated in developing criteria are able to use them reliably in rating students' work.
  • The reliability of portfolio assessment where tasks were not closely specified was low. This finding has been used as an argument for increasing the match between task and assessment criteria by closer specification of tasks.
  • The finer specification of criteria, describing progressive levels of competency, has been shown to be capable of supporting reliable teacher assessment while allowing evidence to be used from the full range of classroom work.
  • The clearer teachers are about the goals of students' work, the more consistently they apply assessment criteria. Detailed criteria describing levels of progress in various aspects of achievement enable teachers to assess students reliably on the basis of regular classroom work.
  • Teachers are able to score hands-on science investigations and projects with high reliability using detailed scoring criteria.
  • The training required for teachers to improve the reliability of their assessment should involve teachers as far as possible in the process of identifying criteria so as to develop ownership of them and understanding of the language used.

Teacher assessment can be shown to be consistent with other 'external' assessments; although, since teacher assessment is intended to be a more all-embracing measure, there is bound to be some disagreement.

  • Results of teacher assessment and standard tasks agree to an extent. This agreement is at a level that suggests that the two approaches assess similar but not identical achievements. There is conflicting evidence as to the relationship between teachers' ratings of students' achievement and standardised test score of the same achievement when the ratings are not based on specific criteria.
  • Any interpretation of the correlations between teacher assessment and standard task results needs to take into account the variability in the administration of standard tasks, especially with younger pupils.

Teacher assessment can be an aid to promoting quality teaching.

  • The introduction of teachers' assessment as part of the national curriculum assessment initially had a beneficial effect on teachers' planning and was integrated into teaching. Subsequently, there was a decline in earlier collaboration among teachers and sharing interpretations of criteria, as support for teacher assessment declined and the focus changed to other initiatives.
  • Moderation through professional collaboration is of benefit to teaching and learning as well as to assessment. Reliable assessment needs protected time for teachers to meet and to take advantage of the support that others, including assessment advisers, can give.

Teacher assessment can be inappropriately influenced by a tendency to be too generous, and to exhibit bias towards good students (the 'halo effect'); but this can be addressed through appropriate training and suitable assessment structures.

  • When rating students' oral proficiency in a foreign language, teachers are consistently more lenient than moderators, but are able to place students in the same rank order as experienced examiners.
  • Studies with pupils aged 6 and 7 in England and Wales in the 1990s, found considerable error and evidence of bias in relation to different groups of students.
  • Teachers' judgement of the academic performance of young children are influenced by the teachers' assessment of their behaviour. This adversely affects the assessment of boys compared with girls.
  • There is bias in teachers' assessment relating to student characteristics, including behaviour (for young children), gender, special educational needs; overall academic achievement and verbal ability may influence judgement when assessing specific skills.
  • Training for teacher assessment should focus on the sources of potential bias that have been revealed by research.

The evidence suggests that procedures used for the assessment, and how well teachers follow them, affect the validity of teacher assessment.

  • Teachers' judgements of young students were found to have high validity when guided by checklists and other materials in a work sampling system. However, there was tentative evidence that in another approach to portfolios, where criteria were not well defined, the validity was low.
  • There is no clear view of how the reliability and validity of teacher assessment varies with the subject assessed. Differences between subjects in how teacher assessment compares with standard tasks or examinations results have been found, but there is no consistent pattern suggesting that assessment in one subject is more or less reliable than in another.
  • It is important for teachers to follow agreed procedures if teacher assessment is to be sufficiently dependable to serve summative purposes. To increase reliability, there is a tension between closer specification of the task and of the conditions under which it is carried out, and the closer specification of the criteria for judging performance.
Teachers and policy-makers need to recognise the potential of teacher assessment, as a valid and reliable alternative to external tests and examinations. Although there are known problems associated with some approaches to teacher assessment, the opportunity exists to adopt an assessment framework that encourages pupils to demonstrate their understanding across multiple tasks in authentic settings. It is clear that the effective introduction of such an approach demands adequate staff training, teacher participation in the design and carefully considered structures.

By promoting more formative approaches to assessment, teachers can be expected to adopt more beneficial strategies in the classroom, and pupils can expect to learn with greater understanding, retention and enjoyment.

