The influence of rater empathy, age and experience on writing performance assessment

Pilvi Alp

Foundation Innove , Estonia

Anu Epner

Foundation Innove , Estonia

Hille Pajupuu

Institute of the Estonian Language , Estonia


Abstract

Assessment reliability is vital in language testing. We have studied the influence of empathy, age and experience on the assessment of the writing component in Estonian Language proficiency examinations at levels A2–C1, and the effect of the rater properties on rater performance at different language levels. The study included 5,270 examination papers, each assessed by two raters. Raters were aged 34–73 and had a rating experience of 3–15 years. The empathy level (EQ) of all 26 A2–C1 raters had previously been measured by Baron-Cohen and Wheelwright’s self-report questionnaire. The results of the correlation analysis indicated that in case of regular training (and with three or more years of experience), the rater’s level of empathy, age and experience did not have a significant effect on the score.

Keywords:

rater effects, rater reliability, empathy, L2 writing, assessment

Altrov, R., Pajupuu, H., & Pajupuu, J. (2013). The role of empathy in the recognition of vocal emotions. Interspeech 2013, 1341 1344. Retrieved from http://www.isca speech.org/archive/archive_papers/interspeech_2013/i13_1341.pdf

Alderson, J. C., Clapham, C., & Wall, D. (1996). Language test construction and evaluation. Cambridge: CUP.

Allison, C., Baron-Cohen, S., Wheelwright, S., Stone M. H., & Muncer, S. J. (2011). Psychometric analysis of the Empathy Quotient (EQ), Personality and Individual Differences, 51(7), 829 835. http://dx.doi.org/10.1016/j.paid.2011.07.005

Ang-Aw, H. T., & Chuen Meng Goh., C. (2011). Understanding discrepancies in rater judgement on national-level oral examination tasks. RELC Journal, 42(1), 31–51. http://dx.doi.org/10.1177/0033688210390226

Attali, Y. (2016). A comparison of newly-trained and experienced raters on a standardized writing assessment. Language Testing, 33(1), 99–115. http://dx.doi.org/10.1177/0265532215582283

Bachman, L. F., & Palmer, A. S. (1996). Language testing in practice. Oxford: OUP.

Baron-Cohen, S., & Wheelwright, S. (2004). The Empathy Quotient: An investigation of adults with Asperger syndrome or high functioning autism, and normal sex differences. Journal of Autism and Developmental Disorders, 34(2), 163–175. http://dx.doi.org/10.1023/B:JADD.0000022607.19833.00

Canter, D., Youngs, D., & Yaneva, M. (2017). Towards a measure of kindness: An exploration of a neglected interpersonal traits. Personality and Individual Differences, 106, 15 20. http://dx.doi.org/10.1016/j.paid.2016.10.019

CEFR 2001. Common European framework of reference for languages: learning, teaching, assessment. Cambridge: CUP.

Chalhoub-Deville, M. (1995). Deriving oral assessment scales across different tests and rater groups. Language Testing, 12(1), 16–33. http://dx.doi.org/10.1177/026553229501200102

Chuang, Y. Y. (2010). How teachers’ background differences affect their rating in EFL oral proficiency assessment. Retrieved from http://ir.csu.edu.tw/handle/987654321/1944

Dewberry, Ch., Davies-Muir, A., & Newell, S. (2013). Impact and causes of rater severity/leniency in appraisals without postevaluation communication between raters and ratees. International Journal of Selection and Assessment, 21(3), 286–293. http://dx.doi.org/10.1111/ijsa.12038

Eckes, T. (2008). Rater types in writing performance assessments: A classification approach to rater variability. Language Testing, 25(2), 155–185. http://dx.doi.org/10.1177/0265532207086780

Fahim, M., & Bijani, H. (2011). The effect of rater training on raters’ severity and bias in second language writing assessment. Iranian Journal of Language Testing, 1(1), 1 16. Retrieved from http://www.ijlt.ir/portal/files/401-2011-01-01.pdf

Language Act (2011). RT I, 18.03.2011, 1.

Leckie, G., & Baird, J.A. (2011). Rater effects on essay scoring: A multilevel analysis of severity drift, central tendency, and rater experience. Journal of Educational Measurement, 48(4), 399 418. http://dx.doi.org/10.1111/j.1745-3984.2011.00152.x

Lim, G. S. (2009). Prompt and rater effects in second language writing performance assessment. Unpublished doctoral dissertation, University of Michigan, Ann Arbor, Michigan, United States. Retrieved from http://hdl.handle.net/2027.42/64665

Ling, G., Mollaun, P., & Xi, X. (2014). A study on the impact of fatigue on human raters when scoring speaking responses. Language Testing, 31(4), 479–499. http://dx.doi.org/10.1177/0265532214530699

Lumley, T. (2002). Assessment criteria in a large-scale writing test: What do they really mean to the raters? Language Testing, 19(3), 246–276. http://dx.doi.org/10.1191/0265532202lt230oa

Lumley, T. (2005). Assessing second language writing. The rater’s perspective. Frankfurt am Main: Peter Lang.

Luoma, S. (2004). Assessing speaking. Cambridge: CUP.

Matsumoto, K., & Kumamoto, T. (n.d.). A study on rater related variables in the evaluation of L2 writing. 104–115. Retrieved from http://www.paaljapan.org/resources/proceedings/PAAL11/pdfs/09.pdf

McNamara, J. (2000). The effects of empathy on speech rating. Unpublished master’s thesis. Eastern Illinois University, Charleston, Illinois, United States. Retrieved from http://thekeep.eiu.edu/theses/1464

McNamara, T. (1996). Measuring second language performance. Harlow Essex: Pearson Education.

Mei, W. S. (2010). Investigating raters’ use of analytic descriptors in assessing writing. Reflections in English Language Teaching, 9(2), 69 104. Retrieved from http://www.nus.edu.sg/celc/research/books/relt/vol9/no2/069to104_wu.pdf

Muncer, S. J., & Ling, J. (2006). Psychometric analysis of the empathy quotient (EQ) scale, Personality and Individual Differences, 40(6), 111–1119. http://dx.doi.org/10.1016/j.paid.2005.09.020

R Core Team. (2016). R: A language and environment for statistical computing. R Foundation for Statistical Computing. Vienna, Austria. Retrieved from http://www.R-project.org/

Sakyi, A. A. (2000). Validation of holistic scoring for ESL writing assessment: How raters’ evaluate compositions. In J. J. Kunnan (Ed.), Fairness and validation in language assessment (pp. 129–152). Cambridge: CUP.

Shi, L., Wang, W., & Wen, Q. (2003). Teaching experience and evaluation of second-language students’ writing. Canadian Journal of Applied Linguistics, 6(2), 219 236. Retrieved from https://journals.lib.unb.ca/index.php/CJAL/article/view/19798

Stemler, S. E., & Tsai, J. (2008). Best practice in interrater reliability: Three common approaches. In J. W. Osborne (Ed.), Best practice in quantitative methods (pp. 29–49). Los Angeles, CA: SAGE Publications.

Walter, H. (2012). Social cognitive neuroscience of empathy: Concepts, circuits, and genes. Emotion Review, 4(1), 9–17. http://dx.doi.org/10.1177/1754073911421379

Weigle, S. C. (1994). Effects of training on raters of ESL compositions. Language Testing, 11(2), 197–223.

Weigle, S. C. (1998). Using FACETS to model rater training effects. Language Testing, 15(2), 263–87. http://dx.doi.org/10.1177/026553229801500205

Weigle, S. C. (1999). Investigating rater/prompt interactions in writing assessment: Quantitative and qualitative approaches. Assessing Writing, 6(2), 145–178. http://dx.doi.org/10.1016/S1075-2935(00)00010-6

Weigle, S. C. (2002). Assessing writing. Cambridge: CUP.

Weir, C. J. (2005). Language testing and validation: An evidence-based approach. Houndgrave, Hampshire, UK: Palgrave-Macmillan.

Download

Published
30-12-2017


Alp, P., Epner, A., & Pajupuu, H. (2017). The influence of rater empathy, age and experience on writing performance assessment. LingBaW. Linguistics Beyond and Within, 3(1), 7–19. https://doi.org/10.31743/lingbaw.5647

Pilvi Alp 
Foundation Innove
Anu Epner 
Foundation Innove
Hille Pajupuu 
Institute of the Estonian Language