Education, Science, Technology, Innovation and Life
Open Access
Sign In

To What Extent Do Non-teacher Raters Differ from Teacher Raters on Assessing Story-retelling

Download as PDF

DOI: 10.23977/langta.2018.11001 | Downloads: 61 | Views: 4712


Lu Weilie 1


1 Guangdong University of Foreign Studies, South China Business College, Guangzhou, China

Corresponding Author

Lu Weilie


The present study aims to explore to what extent non-teacher raters differ from teachers’ raters on assessing story-retelling in China’s National Matriculation English Test, Guangdong version. Facets analysis suggests that the two rater groups are comparable in terms of internal consistency and severity. Results from raters’ written comments show that both rater groups followed a similar pattern in focusing on different criteria categories when making assessments. Difference occurs only in how they commented: teacher raters’ comments tended to be more specific while non-teacher raters’ comments were more general. Based on the findings, we conclude that in rating high stakes tests like Story-retelling in NMET GD, non-teachers (college/graduate students) are qualified to be raters.


Non-teacher Raters, Teacher Raters, Story-retelling, Consistency, Severity, Written comments.


Weilie, L., To What Extent Do Non-teacher Raters Differ from Teacher Raters on Assessing Story-retelling. Journal of Language Testing & Assessment (2018) Vol. 1: 1-13.


[1] Anne Brown (1995) The effect of rater variables in the development of an occupation-specific language performance test.  Language Testing, 12 (1): 1—15.
[2] Arthur Hughes and Chryssoula Lascaratou (1982) Competing criteria for error Binhong Wang (2010) On Rater Agreement and Rater Training, English Language Teaching Vol. 3, No. 1.
[3] Ching-Ni Hsieh (2011) Rater effects in ITA testing: ESL teacherrs’ vesus American undergraduates’ judgements of accentedness, comprehensibility and oral proficiency, Spaan Fellow Working Papers in Second or Foreign Language Assessment, Volume 9: 47—74.
[4] Connor-Linton, J. (1995). Looking behind the curtain: what do L2 composition ratings really mean? TESOL Quarterly 29 (4): 762-765
[5] Cronbach, L.J. (1990) Essentials of Psychological Testing (5th ed.). New York: Harper and Row. Cumming. (1990) Expertise in evaluating second language compositions, Language Testing, 7, pp31-51.
[6] Elana Shohamy, Claire M. Gordon and Robert Kraemer (1992) The effect of Raters’ Background and Training on the Reliability of Direct Writing Tests, The Modern Language Journal, Vol. 76, No. 1 (Spring, 1992), pp. 27—33.
[7] Hadden, B.L. (1991) Teacher and nonteacher perceptions of second-language communication. Language Learning, 41, 1—24.
[8] He Manzu. (2006) A FACETS Analysis of Rater Bias in Measuring Chinese Students’ English Writing—A Comparative Study of Holistic and Analytic Scoring Methods, Unpublished MA thesis, Guangdong University of Foreign Studies, China
[9] Li Xiaoju. (1997) The Science and Art of Language Testing Hunan Education Publishing House Lumley & NcNamara. (1995) Rater characteristics and rater bias: implications for training, Language Testing 12 (1): 54-71.
[10] McNamara, T. F. (1996) Measuring second language performance. Harlow: Longman. Rob Schoonen, Margaretha Vergeer and Mindert Eiting. (1997) The assessment of writing ability: Expert readers versus lay readers, 14 (2): 157—184.
[11] Shi L. (2001) Native-and nonnative-speaking EFL teachers’ evaluation of Chinese students’ English writing.  Language Testing, 18, 303-325.
[12] Ying Zhang and Catherine Elder. (2010) Judgments of oral proficiency by non-native and native English speaking teacher raters: Competing or complementary construct?, Language Testing, 28 (1):31—50. 
[13] Youn-Hee Kim. (2009) An investigation into native and non-native teachers’ judgements of oral English performance: A mixed methods approach, Language Testing, 26 (2): 187—217.
[14] Wang Haizhen. (2007) Jiyu Pingfen Guocheng Zhengju de Yingyu Zhuanye Siji Koushi Xiaodu Yanjiu (Validation of TEM4-Oral: Evidence from Raters’ Assessment Process). Jiefangjun Waiguoyuxueyuan Xuebao (Journal of PLA University of Foreign Languages). Vol. 30 No. 4. 
[15] Zhang jie. (2009) Exploring rating process and rater belief –Seeking the internal account for rater variability, Unpublished Phd Dissertation, Guangdong University of Foreign Studies, China.
[16] Zhang Jie & He Lianzhen. (2008) Study of Sources of Score Variability in Performance Assessment Using MFRM: A Case of Speaking Test in Pets Band 3, CELEA Journal (Bimontyly), Vol. 31 No. 4.
[17] Zeng Yongqiang. (2009) The computerized oral English test of the National Matriculation English Test. In Cheng Liying & A. Curtis (Eds.), English Language Assessment and the Chinese Learner (pp. 234-247). New York and London: Taylor & Francis.

All published work is licensed under a Creative Commons Attribution 4.0 International License.

Copyright © 2016 - 2031 Clausius Scientific Press Inc. All Rights Reserved.