For every college professor, teaching is an important part of their job. For most college professors, who are not located at relatively few research-oriented universities, teaching the main part of their job. So how can we evaluate whether teaching is being done well or poorly? This question applies both at the individual level, but also for bigger institutional questions: for example, are faculty with lifetime tenure, who were granted tenure in substantial part for their performance as researchers, better teachers than faculty with short-term contracts? David Figlio and Morton Schapiro tackle such questions in \”Staffing the Higher Education Classroom\” (Journal of Economic Perspectives, Winter 2021, 35:1, 143-62).
The question of how to evaluate college teaching isn\’t easy. For example, there are not annual exams as often occur at the K-12 level, nor are certain classes followed by a common exam like the AP exams in high school. My experience is that the faculty colleges and universities are not especially good at self-policing of teaching. In some cases, newly hired faculty get some feedback and guidance, and there are hallway discussions about especially awful teachers, but that\’s about it. Many colleges and universities have questionnaires on which students can evaluate faculty. This is probably a better method than throwing darts in the dark, but it is also demonstrably full of biases: students may prefer easier graders, classes that require less work, or classes with an especially charismatic professor. There is a developed body of evidence that white American faculty members tend to score higher. Figlio and Schapiro write:
Concerns about bias have led the American Sociological Association (2019) to caution against over-reliance on student evaluations of teaching, pointing out that “a growing body of evidence suggests that their use in personnel decisions is problematic” given that they “are weakly related to other measures of teaching effectiveness and student learning” and that they “have been found to be biased against women and people of color.” The ASA suggests that “student feedback should not be used alone as a measure of teaching quality. If it is used in faculty evaluation processes, it should be considered as part of a holistic assessment of teaching effectiveness.” Seventeen other scholarly associations, including the American Anthropological Association, the American Historical Association, and the American Political Science Association, have endorsed the ASA report …
Figlio and Schapiro suggest two measures of effective teaching for intro-level classes: 1) how many students from a certain intro-level teacher go on to become majors in the subject, and 2) \”deep learning,\” which is combination of how many in an intro-level class go on to take any additional classes in a subject, and do whether students from a certain teacher tend to perform better in those follow-up classes. They authors are based at Northwestern University, and so they were able to obtain \”registrar data on all Northwestern University freshmen who entered between fall 2001 and fall 2008, a total of 15,662 students, and on the faculty who taught them during their first quarter at Northwestern.\”
But what if state legislators take seriously our finding that while top teachers don’t sacrifice research output, it is also the case that top researchers don’t teach exceptionally well? Why have those high-priced scholars in the undergraduate classroom in the first place? Surely it would be more cost-efficient to replace them in the classroom either with untenured, lower-paid professors, or with faculty not on the tenure-line in the first place. That, of course, is what has been happening throughout American higher education for the past several decades, as we discuss in detail in the section that follows. And, of course, there’s the other potentially uncomfortable question that our analysis implies: Should we be concerned about the possibility that the weakest scholars amongst the tenured faculty are no more distinguished in the classroom than are the strongest scholars? Should expectations for teaching excellence be higher for faculty members who are on the margin of tenurability on the basis of their research excellence?
Figlio and Schapiro then extend their analysis to looking at the teaching quality of non-tenure track faculty. Their results here do need to be interpreted with care, given that non-tenure contract faculty at Northwestern often operate with three-year renewable contracts, and most of these faculty in this category are in their second or later contract. They write:
Thus, our results should be viewed in the context of where non-tenure faculty at a major research university function as designated teachers (both full-time and part-time) with long-term relationships to the university. We find that, on average, tenure-line faculty members do not teach introductory undergraduate courses as well as do their (largely full-time, long-term) contingent faculty counterparts. In other words, our results suggest that on average, first-term freshmen learn more from contingent faculty members than they do from tenure track/tenured faculty.
Specifically, we use student panel data from the United States Naval Academy (USNA), where freshmen and sophomores must take a set of mandatory sequential courses, which includes courses in the humanities, social sciences, and STEM disciplines. Students cannot directly choose which courses to take nor when to take them. They cannot choose their instructors. They cannot switch instructors at any point. They must take the core sequence regardless of interest or ability.\” In addition:
Due to unique institutional features, we observe students’ administratively recorded grades at different points during the semester, including a cumulative course grade immediately prior to the final exam, a final exam grade, and an overall course grade, allowing us to separately estimate multiple aspects of faculty value-added. Given that instructors determine the final grades of their students, there are both objective and subjective components of any academic performance measure. For a subset of courses inour sample, however, final exams are created, administered, and graded by faculty who do not directly influence the final course grade. This enables us to disentangle faculty impacts on objective measures of student learning within a course (grade on final exam) from faculty-specific subjective grading practices (final course grade). Using the objectively determined final exam grade, we measure the direct impact of the instructor on the knowledge learned by the student.
We find that instructors who help boost the common final exam scores of their students also boost their performance in the follow-on course. Instructors who tend to give out easier subjective grades however dramatically hurt subsequent student performance. Exploring a variety of mechanisms, we suggest that instructors harm students not by “teaching to the test,” but rather by producing misleading signals regarding the difficulty of the subject and the “soft skills” needed for college success. This effect is stronger in non-STEM fields, among female students, and among extroverted students. Faculty that are well-liked by students—and thus likely prized by university administrators—and considered to be easy have particularly pernicious effects on subsequent student performance.
Again, this result is based on data from a nonrepresentative academic institution. But it does suggest some dangers of relying on contemporaneous popularity among students as a measure of teaching performance.