Year 2019, Volume 6, Issue 2, Pages 218 - 234 2019-06-30

Investigating a new method for standardising essay marking using levels-based mark schemes

Jackie Greatorex [1] , Tom Sutch [2] , Magda Werno [3] , Jess Bowyer [4] , Karen Dunn [5]

2 47

Standardisation is a procedure used by Awarding Organisations to maximise marking reliability, by teaching examiners to consistently judge scripts using a mark scheme. However, research shows that people are better at comparing two objects than judging each object individually. Consequently, Oxford, Cambridge and RSA (OCR, a UK awarding organisation) proposed investigating a new procedure, involving ranking essays, where essay quality is judged in comparison to other essays. This study investigated the marking reliability yielded by traditional standardisation and ranking standardisation. The study entailed a marking experiment followed by examiners completing a questionnaire. In the control condition live procedures were emulated as authentically as possible within the confines of a study. The experimental condition involved ranking the quality of essays from the best to the worst and then assigning marks. After each standardisation procedure the examiners marked 50 essays from an AS History unit. All participants experienced both procedures, and marking reliability was measured. Additionally, the participants’ questionnaire responses were analysed to gain an insight into examiners’ experience. It is concluded that the Ranking Procedure is unsuitable for use in public examinations in its current form. The Traditional Procedure produced statistically significantly more reliable marking, whilst the Ranking Procedure involved a complex decision-making process. However, the Ranking Procedure produced slightly more reliable marking at the extremities of the mark range, where previous research has shown that marking tends to be less reliable.
Comparative judgement, Marking, Standardisation, Reliability, Essay
  • Ahmed, A., & Pollitt, A. (2011). Improving marking quality through a taxonomy of mark schemes. Assessment in Education: Principles, Policy & Practice, 18(3), 259-278. doi: http://dx.doi.org/10.1080/0969594X.2010.546775
  • Baird, J.-A., Greatorex, J., & Bell, J. F. (2004). What makes marking reliable? Experiments with UK examinations. Assessment in Education: Principles, Policy & Practice, 11(3), 331-348.
  • Barkaoui, K. (2011). Effects of marking method and rater experience on ESL essay scores and rater performance. Assessment in Education: Principles, Policy & Practice, 18(3), 279-293.
  • Benton, T., & Gallagher, T. (2018). Is comparative judgement just a quick form of multiple marking. Research Matters: A Cambridge Assessment Publication (26), 22-28. Billington, L., & Davenport, C. (2011). On line standardisation trial, Winter 2008: Evaluation of examiner performance and examiner satisfaction. Manchester: AQA Centre for Education Research Policy.
  • Black, B., Suto, W. M. I., & Bramley, T. (2011). The interrelations of features of questions, mark schemes and examinee responses and their impact upon marker agreement. Assessment in Education: Principles, Policy & Practice, 18(3), 295-318.
  • Bramley, T. (2009). Mark scheme features associated with different levels of marker agreement. Research Matters: A Cambridge Assessment Publication (8), 16-23.
  • Bramley, T. (2015). Investigating the reliability of Adaptive Comparative Judgment Cambridge Assessment Research Report. Cambridge, UK: Cambridge Assessment.
  • Bramley, T., & Vitello, S. (2018). The effect of adaptivity on the reliability coefficient in adaptive comparative judgement. Assessment in Education: Principles, Policy & Practice, 1-16. doi: 10.1080/0969594X.2017.1418734
  • Braun, V., & Clarke, V. (2006). Using thematic analysis in psychology. Qualitative Research in Psychology, 3, 77-101.
  • Çetin, Y. (2011). Reliability of raters for writing assessment: analytic - holistic, analytic-analytic, holistic-holistic. Mustafa Kemal University Journal of Social Sciences Institute, 8(16), 471-486.
  • Chamberlain, S., & Taylor, R. (2010). Online or face to face? An experimental study of examiner training. British Journal of Educational Technology, 42(4), 665-675.
  • Greatorex, J., & Bell, J. F. (2008a). What makes AS marking reliable? An experiment with some stages from the standardisation process. Research Papers in Education, 23(3), 333-355.
  • Greatorex, J., & Bell, J. F. (2008b). What makes AS marking reliable? An experiment with some stages from the standardisation process. Research Papers in Education, 23(3), 333-355.
  • Harsch, C., & Martin, G. (2013). Comparing holistic and analytic scoring methods: issues of validity and reliability. Assessment in Education: Principles, Policy & Practice, 20(3), 281-307.
  • Johnson, M., & Black, B. (2012). Feedback as scaffolding: senior examiner monitoring processes and their effects on examiner marking. Research in Post-Compulsory Education, 17(4), 391-407.
  • Jones, B., & Kenward, M. G. (1989). Design and Analysis of Cross-Over Trials. London: Chapman and Hall.
  • Kimbell, R. (2007). e-assessment in project e-scape. Design and Technology Education: an International Journal, 12(2), 66-76.
  • Kimbell, R., Wheeler, T., Miller, S., & Pollitt, A. (2007). E-scape portfolio assessment. Phase 2 report. London: Department for Education and Skills.
  • Knoch, U. (2007). ‘ Little coherence, considerable strain for reader’: A comparison between two rating scales for the assessment of coherence. Assessing Writing, 12(2), 108-128. doi: 10.1016/j.asw.2007.07.002
  • Knoch, U., Read, J., & von Randow, J. (2007). Re-training writing raters online: How does it compare with face-to-face training? Assessing Writing, 12, 26-43. doi: 10.1016/j.asw.2007.04.001
  • Lai, E. R., Wolfe, E. W., & Vickers, D. H. (2012). Halo Effects and Analytic Scoring: A Summary of Two Empirical Studies Research Report. New York: Pearson Research and Innovation Network.
  • Laming, D. (2004). Human judgment: the eye of the beholder. Hong Kong: Thomson Learning.
  • Meadows, M., & Billington, L. (2007). NAA Enhancing the Quality of Marking Project: Final Report for Research on Marker Selection. Manchester: National Assessment Agency.
  • Meadows, M., & Billington, L. (2010). The effect of marker background and training on the quality of marking in GCSE English. Manchester: Centre for Education Research and Policy.
  • Michieka, M. (2010). Holistic or Analytic Scoring? Issues in Grading ESL Writing. TNTESOL Journal.
  • O'Donovan, N. (2005). There are no wrong answers: an investigation into the assessment of candidates' responses to essay-based examinations. Oxford Review of Education, 31, 395-422.
  • Pinot de Moira, A. (2011a). Effective discrimination in mark schemes. Manchester: AQA.
  • Pinot de Moira, A. (2011b). Levels-based mark schemes and marking bias. Manchester: AQA.
  • Pinot de Moira, A. (2013). Features of a levels-based mark scheme and their effect on marking reliability. Manchester: AQA.
  • Pollitt, A. (2009). Abolishing marksism and rescuing validity. Paper presented at the International Association for Educational Assessment, Brisbane, Australia. http://www.iaea.info/documents/paper_4d527d4e.pdf
  • Pollitt, A. (2012a). Comparative judgement for assessment. International Journal of Technology and Design Education, 22(2), 157-170.
  • Pollitt, A. (2012b). The method of adaptive comparative judgement. Assessment in Education: Principles, Policy & Practice, 19(3), 281 - 300. doi: http://dx.doi.org/10.1080/0969594X.2012.665354
  • Pollitt, A., Elliott, G., & Ahmed, A. (2004). Let's stop marking exams. Paper presented at the International Association for Educational Assessment, Philadelphia, USA.
  • Raikes, N., Fidler, J., & Gill, T. (2009). Must examiners meet in order to standardise their marking? An experiment with new and experienced examiners of GCE AS Psychology Paper presented at the British Educational Research Association, University of Manchester, UK.
  • Senn, S. (2002). Cross-Over Trials in Clinical Research. Chichester: Wiley.
  • Suto, I., Nádas, R., & Bell, J. (2011a). Who should mark what? A study of factors affecting marking accuracy in a biology examination. Research Papers in Education, 26(1), 21-51.
  • Suto, W. M. I., & Greatorex, J. (2008). A quantitative analysis of cognitive strategy usage in the marking of two GCSE examinations. Assessment in Education: Principles, Policy & Practice, 15(1), 73-89.
  • Suto, W. M. I., Greatorex, J., & Nádas, R. (2009). Thinking about making the right mark: Using cognitive strategy research to explore examiner training. Research Matters: A Cambridge Assessment Publication(8), 23-32.
  • Suto, W. M. I., & Nádas, R. (2008). What determines GCSE marking accuracy? An exploration of expertise among maths and physics markers. Research Papers in Education, 23(4), 477-497. doi: 10.1080/02671520701755499
  • Suto, W. M. I., & Nádas, R. (2009). Why are some GCSE examination questions harder to mark accurately than others? Using Kelly's Repertory Grid technique to identify relevant question features. Research Papers in Education, 24(3), 335-377. doi: http://dx.doi.org/10.1080/02671520801945925
  • Suto, W. M. I., Nádas, R., & Bell, J. (2011b). Who should mark what? A study of factors affecting marking accuracy in a biology examination. Research Papers in Education, 26(1), 21-51.
  • Sykes, E., Novakovic, N., Greatorex, J., Bell, J., Nádas, R., & Gill, T. (2009). How effective is fast and automated feedback to examiners in tackling the size of marking errors? Research Matters: A Cambridge Assessment Publication (8), 8-15.
  • Whitehouse, C., & Pollitt, A. (2012). Using adaptive comparative judgement to obtain a highly reliable rank order in summative assessment. Manchester: AQA Centre for Education Research and Policy.
  • Wolfe, E. W., Matthews, S., & Vickers, D. (2010). The effectiveness and efficiency of distributed online, regional online, and regional face-to-face training for writing assessment raters. The Journal of Technology, Learning and Assessment, 10(1). http://ejournals.bc.edu/ojs/index.php/jtla/article/view/1601/1457
  • Wolfe, E. W., & McVay, A. (2010). Rater effects as a function of rater training context. New York: Pearson Research and Innovation Network.
Primary Language en
Subjects Education, Scientific Disciplines
Published Date June
Journal Section Articles
Authors

Orcid: 0000-0002-2303-0638
Author: Jackie Greatorex (Primary Author)
Institution: Cambridge Assessment
Country: United Kingdom


Orcid: 0000-0001-8157-277X
Author: Tom Sutch
Institution: Cambridge Assessment
Country: United Kingdom


Author: Magda Werno
Institution: Cambridge Assessment
Country: United Kingdom


Author: Jess Bowyer
Institution: University of Exeter
Country: United Kingdom


Orcid: 0000-0002-7499-9895
Author: Karen Dunn
Institution: British Council

Bibtex @research article { ijate564824, journal = {International Journal of Assessment Tools in Education}, issn = {}, eissn = {2148-7456}, address = {İzzet KARA}, year = {2019}, volume = {6}, pages = {218 - 234}, doi = {10.21449/ijate.564824}, title = {Investigating a new method for standardising essay marking using levels-based mark schemes}, key = {cite}, author = {Greatorex, Jackie and Sutch, Tom and Werno, Magda and Bowyer, Jess and Dunn, Karen} }
APA Greatorex, J , Sutch, T , Werno, M , Bowyer, J , Dunn, K . (2019). Investigating a new method for standardising essay marking using levels-based mark schemes. International Journal of Assessment Tools in Education, 6 (2), 218-234. DOI: 10.21449/ijate.564824
MLA Greatorex, J , Sutch, T , Werno, M , Bowyer, J , Dunn, K . "Investigating a new method for standardising essay marking using levels-based mark schemes". International Journal of Assessment Tools in Education 6 (2019): 218-234 <http://submit.ijate.net/issue/44255/564824>
Chicago Greatorex, J , Sutch, T , Werno, M , Bowyer, J , Dunn, K . "Investigating a new method for standardising essay marking using levels-based mark schemes". International Journal of Assessment Tools in Education 6 (2019): 218-234
RIS TY - JOUR T1 - Investigating a new method for standardising essay marking using levels-based mark schemes AU - Jackie Greatorex , Tom Sutch , Magda Werno , Jess Bowyer , Karen Dunn Y1 - 2019 PY - 2019 N1 - doi: 10.21449/ijate.564824 DO - 10.21449/ijate.564824 T2 - International Journal of Assessment Tools in Education JF - Journal JO - JOR SP - 218 EP - 234 VL - 6 IS - 2 SN - -2148-7456 M3 - doi: 10.21449/ijate.564824 UR - https://doi.org/10.21449/ijate.564824 Y2 - 2019 ER -
EndNote %0 International Journal of Assessment Tools in Education Investigating a new method for standardising essay marking using levels-based mark schemes %A Jackie Greatorex , Tom Sutch , Magda Werno , Jess Bowyer , Karen Dunn %T Investigating a new method for standardising essay marking using levels-based mark schemes %D 2019 %J International Journal of Assessment Tools in Education %P -2148-7456 %V 6 %N 2 %R doi: 10.21449/ijate.564824 %U 10.21449/ijate.564824
ISNAD Greatorex, Jackie , Sutch, Tom , Werno, Magda , Bowyer, Jess , Dunn, Karen . "Investigating a new method for standardising essay marking using levels-based mark schemes". International Journal of Assessment Tools in Education 6 / 2 (June 2019): 218-234. https://doi.org/10.21449/ijate.564824