The influence of rater empathy, age and experience on writing performance assessment
Pilvi Alp
Foundation Innove , EstoniaAnu Epner
Foundation Innove , EstoniaHille Pajupuu
Institute of the Estonian Language , EstoniaAbstract
Assessment reliability is vital in language testing. We have studied the influence of empathy, age and experience on the assessment of the writing component in Estonian Language proficiency examinations at levels A2–C1, and the effect of the rater properties on rater performance at different language levels. The study included 5,270 examination papers, each assessed by two raters. Raters were aged 34–73 and had a rating experience of 3–15 years. The empathy level (EQ) of all 26 A2–C1 raters had previously been measured by Baron-Cohen and Wheelwright’s self-report questionnaire. The results of the correlation analysis indicated that in case of regular training (and with three or more years of experience), the rater’s level of empathy, age and experience did not have a significant effect on the score.
Keywords:
rater effects, rater reliability, empathy, L2 writing, assessmentReferences
Altrov, R., Pajupuu, H., & Pajupuu, J. (2013). The role of empathy in the recognition of vocal emotions. Interspeech 2013, 1341 1344. Retrieved from http://www.isca speech.org/archive/archive_papers/interspeech_2013/i13_1341.pdf
Alderson, J. C., Clapham, C., & Wall, D. (1996). Language test construction and evaluation. Cambridge: CUP.
Allison, C., Baron-Cohen, S., Wheelwright, S., Stone M. H., & Muncer, S. J. (2011). Psychometric analysis of the Empathy Quotient (EQ), Personality and Individual Differences, 51(7), 829 835. http://dx.doi.org/10.1016/j.paid.2011.07.005
Ang-Aw, H. T., & Chuen Meng Goh., C. (2011). Understanding discrepancies in rater judgement on national-level oral examination tasks. RELC Journal, 42(1), 31–51. http://dx.doi.org/10.1177/0033688210390226
Attali, Y. (2016). A comparison of newly-trained and experienced raters on a standardized writing assessment. Language Testing, 33(1), 99–115. http://dx.doi.org/10.1177/0265532215582283
Bachman, L. F., & Palmer, A. S. (1996). Language testing in practice. Oxford: OUP.
Baron-Cohen, S., & Wheelwright, S. (2004). The Empathy Quotient: An investigation of adults with Asperger syndrome or high functioning autism, and normal sex differences. Journal of Autism and Developmental Disorders, 34(2), 163–175. http://dx.doi.org/10.1023/B:JADD.0000022607.19833.00
Canter, D., Youngs, D., & Yaneva, M. (2017). Towards a measure of kindness: An exploration of a neglected interpersonal traits. Personality and Individual Differences, 106, 15 20. http://dx.doi.org/10.1016/j.paid.2016.10.019
CEFR 2001. Common European framework of reference for languages: learning, teaching, assessment. Cambridge: CUP.
Chalhoub-Deville, M. (1995). Deriving oral assessment scales across different tests and rater groups. Language Testing, 12(1), 16–33. http://dx.doi.org/10.1177/026553229501200102
Chuang, Y. Y. (2010). How teachers’ background differences affect their rating in EFL oral proficiency assessment. Retrieved from http://ir.csu.edu.tw/handle/987654321/1944
Dewberry, Ch., Davies-Muir, A., & Newell, S. (2013). Impact and causes of rater severity/leniency in appraisals without postevaluation communication between raters and ratees. International Journal of Selection and Assessment, 21(3), 286–293. http://dx.doi.org/10.1111/ijsa.12038
Eckes, T. (2008). Rater types in writing performance assessments: A classification approach to rater variability. Language Testing, 25(2), 155–185. http://dx.doi.org/10.1177/0265532207086780
Fahim, M., & Bijani, H. (2011). The effect of rater training on raters’ severity and bias in second language writing assessment. Iranian Journal of Language Testing, 1(1), 1 16. Retrieved from http://www.ijlt.ir/portal/files/401-2011-01-01.pdf
Language Act (2011). RT I, 18.03.2011, 1.
Leckie, G., & Baird, J.A. (2011). Rater effects on essay scoring: A multilevel analysis of severity drift, central tendency, and rater experience. Journal of Educational Measurement, 48(4), 399 418. http://dx.doi.org/10.1111/j.1745-3984.2011.00152.x
Lim, G. S. (2009). Prompt and rater effects in second language writing performance assessment. Unpublished doctoral dissertation, University of Michigan, Ann Arbor, Michigan, United States. Retrieved from http://hdl.handle.net/2027.42/64665
Ling, G., Mollaun, P., & Xi, X. (2014). A study on the impact of fatigue on human raters when scoring speaking responses. Language Testing, 31(4), 479–499. http://dx.doi.org/10.1177/0265532214530699
Lumley, T. (2002). Assessment criteria in a large-scale writing test: What do they really mean to the raters? Language Testing, 19(3), 246–276. http://dx.doi.org/10.1191/0265532202lt230oa
Lumley, T. (2005). Assessing second language writing. The rater’s perspective. Frankfurt am Main: Peter Lang.
Luoma, S. (2004). Assessing speaking. Cambridge: CUP.
Matsumoto, K., & Kumamoto, T. (n.d.). A study on rater related variables in the evaluation of L2 writing. 104–115. Retrieved from http://www.paaljapan.org/resources/proceedings/PAAL11/pdfs/09.pdf
McNamara, J. (2000). The effects of empathy on speech rating. Unpublished master’s thesis. Eastern Illinois University, Charleston, Illinois, United States. Retrieved from http://thekeep.eiu.edu/theses/1464
McNamara, T. (1996). Measuring second language performance. Harlow Essex: Pearson Education.
Mei, W. S. (2010). Investigating raters’ use of analytic descriptors in assessing writing. Reflections in English Language Teaching, 9(2), 69 104. Retrieved from http://www.nus.edu.sg/celc/research/books/relt/vol9/no2/069to104_wu.pdf
Muncer, S. J., & Ling, J. (2006). Psychometric analysis of the empathy quotient (EQ) scale, Personality and Individual Differences, 40(6), 111–1119. http://dx.doi.org/10.1016/j.paid.2005.09.020
R Core Team. (2016). R: A language and environment for statistical computing. R Foundation for Statistical Computing. Vienna, Austria. Retrieved from http://www.R-project.org/
Sakyi, A. A. (2000). Validation of holistic scoring for ESL writing assessment: How raters’ evaluate compositions. In J. J. Kunnan (Ed.), Fairness and validation in language assessment (pp. 129–152). Cambridge: CUP.
Shi, L., Wang, W., & Wen, Q. (2003). Teaching experience and evaluation of second-language students’ writing. Canadian Journal of Applied Linguistics, 6(2), 219 236. Retrieved from https://journals.lib.unb.ca/index.php/CJAL/article/view/19798
Stemler, S. E., & Tsai, J. (2008). Best practice in interrater reliability: Three common approaches. In J. W. Osborne (Ed.), Best practice in quantitative methods (pp. 29–49). Los Angeles, CA: SAGE Publications.
Walter, H. (2012). Social cognitive neuroscience of empathy: Concepts, circuits, and genes. Emotion Review, 4(1), 9–17. http://dx.doi.org/10.1177/1754073911421379
Weigle, S. C. (1994). Effects of training on raters of ESL compositions. Language Testing, 11(2), 197–223.
Weigle, S. C. (1998). Using FACETS to model rater training effects. Language Testing, 15(2), 263–87. http://dx.doi.org/10.1177/026553229801500205
Weigle, S. C. (1999). Investigating rater/prompt interactions in writing assessment: Quantitative and qualitative approaches. Assessing Writing, 6(2), 145–178. http://dx.doi.org/10.1016/S1075-2935(00)00010-6
Weigle, S. C. (2002). Assessing writing. Cambridge: CUP.
Weir, C. J. (2005). Language testing and validation: An evidence-based approach. Houndgrave, Hampshire, UK: Palgrave-Macmillan.
Foundation Innove
Foundation Innove
Institute of the Estonian Language