Proposals for evaluating the classroom performance of K-12 teachers are typically based on hopes and fears, not on actual evidence. Those who support such evaluations hope to improve the quality of teaching by linking evaluations to teacher pay and jobs. The teacher unions who typically oppose such evaluations fear that they will be used arbitrarily, punitively, even whimsically, but in some way that will make teaching an even harder job.
The dispute seems intractable. But in the December 2012 ssue of the American Economic Review, Eric S. Taylor (no relation!) and John H. Tyler offer actual real-world evidence on \”The Effect of Evaluation on Teacher Performance.\” (The AER is not freely available on-line, but many in academia will have access through library subscriptions.)
Taylor and Tyler have evidence on a sample of a little more than 100 mid-career math teachers in the Cincinnati Public Schools in fourth through eighth grade. These teachers were hired between 1993–1994 and 1999–2000. Then in 2000, a district planning process called for these teachers to be evaluated in a a year-long classroom observation–based program, which then occurred some time between 2003–2004 and 2009–2010. The order in which teachers were chosen for evaluation, and the year in which the evaluation occurred, were for practical purposes random. The actual evaluation involved observation of actual classroom teaching. But the researchers were also able to collect evidence on math test scores for students. Although these scores were not part of the teacher evaluation, the researchers could then look to see whether the teacher evaluation process affected student scores. (Indeed, one of the reasons for looking at math teachers was because scores on a math test provide a fairly good measure of student performance, compared with other subjects.) Again, these were mid-career teachers who typically had not been evaluated in any systematic way for years.
Here\’s how the evaluation process worked: \”During the TES [Teacher Evaluation System] evaluation year, teachers are typically observed in the classroom and scored four times: three times by an assigned peer evaluator—high-performing, experienced teachers who are external to the school—and once by the principal or another school administrator. Teachers are informed of the week during which the first observation will occur, with all other observations being unannounced. The evaluation measures dozens of specific skills and practices covering classroom management, instruction, content knowledge, and planning, among other topics. Evaluators use a scoring rubric, based on Charlotte Danielson’s Enhancing Professional Practice: A Framework for Teaching (1996), which describes performance of each skill and practice at four levels: “Distinguished,” “Proficient,” “Basic,” and “Unsatisfactory.” …After each classroom observation, peer evaluators and administrators provide written feedback to the teacher, and meet with the teacher at least once to discuss the results. \”
A common pattern is often found in these kinds of subjective evaluations: that is, the evaluators are often pretty tough in grading and commenting on lots of specific skills and practices, but then they still tend to give a high overall grade. This pattern occurred here, as well. The authors write: \”More than 90 percent of teachers receive final overall TES scores in the “Distinguished” or “Proficient” categories. Leniency is much less frequent in the individual rubric items and individual observations …\”
In theory, teachers who were fairly new to the district could lose their job if their evaluation score was low enough, and those who scored very high could get a raise, but because almost everyone was ending up with fairly high overall scores, so the practical effects of this evaluation in terms of pay and jobs was pretty minimal.
Nevertheless, student performance not only went up during the year that the evaluation happened, but student performance stayed higher for teachers who had been evaluated in previous years. \”The estimates presented here—greater teacher productivity as measured by student achievement gains in years following TES evaluation—strongly suggest that teachers develop skill or otherwise change their behavior in a lasting manner as a result of undergoing subjective performance evaluation in the TES system. Imagine two students taught by the same teacher in different years who both begin the year at the fiftieth percentile of math achievement. The student taught after the teacher went through comprehensive TES evaluation would score about 4.5 percentile points higher at the end of the year than the student taught before the teacher went through the evaluation. … Indeed, our estimates indicate that postevaluation improvements in performance were largest for teachers whose performance was weakest prior to evaluation, suggesting that teacher evaluation may be an effective professional development tool.\”
By the standards typically prevailing in K-12 education, the idea that teachers should experience an actual classroom evaluation consisting of four visits in a year, maybe once a decade or so, would have to be considered highly interventionist–which is ludicrous. Too many teachers perceive their classroom as a private zone where they should not and perhaps cannot be judged. But teaching is a profession, and the job performance of professionals should be evaluated by other professionals. The Cincinnati evidence strongly suggests that detailed, low-stakes, occasional evaluation by other experienced teachers can improve the quality of teaching over time. Maybe if some of the school reformers backed away from trying to attach potentially large consequences to such evaluations in terms of pay and jobs, at least a few teachers\’ unions would be willing to support this step toward a higher quality of teaching.
Note: Some readers might also be interested in this earlier post from October 3, 2011, \”Low-Cost Education Reforms: Later Starts, K-8, and Focusing Teachers.\”