I have a bad habit of using quotation marks a lot in my writing. I'm not sure where that particular tic comes from. But I believe in the case above, "growth" and "measured" deserve quotes.
My Algebra 1 colleagues and I developed a test to be given at the beginning and at the end of the first quarter. Another colleague in a different discipline worked with others to create 4 such tests - one for each quarter. I can only assume that other teachers spent the afternoon doing similar work. And at the end of the day, what had we all produced?
- Tests that will ultimately be used to evaluate us first, students second. The state of New Jersey now has a law requiring that 50% of a teacher's evaluation be based on student assessments.
- Tests that were developed in an ad hoc fashion. How confident are we, for example, that these tests measure anything worth measuring? How confident are we that a measure of student progress made in this way is any better than the traditional way - that is, teachers get to know their students, they assess them, and by their assessments determine whether or not the students understand a particular set of content?
- Tests that, depending on how the individual departments structured them, could very well be inappropriate for the task they're supposed to be used for. Hypothetically, one could imagine the physics benchmark exam at my school being used simultaneously for 9th grade physics (including "modified"/high level of special-needs students physics, "regular" physics, and honors physics) as well as senior physics, AP physics B, and AP physics C. Or the US history 1 benchmark exam being used for modified, regular, and honors levels.
- Tests that will take time away from classroom instruction, as they will be administered during class periods. On the model that two per quarter are given, that is 8 days for testing taken away every year.
But what is happening now is that the question has taken on a new reality. There are real, practical consequences for a teacher who does not produce sufficient "growth" among their students. The answer to the question above is now based on a very simple model: a student comes in at baseline knowledge level X, the teacher does their thing, the student reaches new level of knowledge Y, and the growth for that student = Y minus X. Aggregated over all students that a teacher has, the teacher's ability to produce growth can also be measured. And so teachers who produce higher levels of growth receive concomitantly higher ratings.
There are two sets of problems with this. The first I've already mentioned: what is meant by student growth anyway?
Assume, for the heck of it, that there is a valid answer, and that in some fashion it can be measured. The second set of problems revolves around the translation between student growth and teacher performance. These issues have been widely discussed, and where you fall on the spectrum of belief that performance drives growth seems largely to align with your political views if you're not a teacher. Using a crude model, an r value of 1 means that teacher performance perfectly correlates with student growth, while an r of 0 means that there is no correlation. (An r of -1 is scary to contemplate.) I am of course ignoring the fact that correlation ≠ causation in general, but market-driven reformers tend to ignore that.
Anyway, my rough estimate is that market-driven reformers tend to put r close to 1, the general public puts it at about 0.7, and teachers put it at about 0.5.
There are three major flaws that I see:
First, the idea that the growth of a student can be determined using this model is massively oversimplified. It might be considerably more difficult to take a student from 70% knowledge (whatever that means) to 90% than it is to go from 0% to 20%. But maybe not - maybe it's the other way around. Who knows? But in this model, the teacher has produced 20% knowledge growth in both cases.
Second, the idea that the student plays no role in their own growth is implicit in this model. The use of these benchmark exams for teacher evaluation gives support to that contention. If there truly were recognition that students came with different levels of motivation (either internal or evoked by the teacher), for example, then there would be no way to compare students - which would mean no way to compare teachers. What, for example, is learned by comparing two students who went from 70% knowledge to 90% knowledge, when one of them busted their behind to "get it" and the other simply was able to absorb the material presented to them and make sense of it? Which teacher did a better job? What if the first student was motivated by the teacher to put in that effort, while the second disliked the teacher but had intrinsic interest in the subject?
Third, the idea that external factors are irrelevant is also implicit. What happens to the teacher who has a student whose parents get a divorce? Or whose family is having a job or other financial crisis at home? Or have a serious illness or death in the family? That student, understandably, may not have their full attention on their academic work; their growth during that time may be less than it would otherwise. In a data-driven evaluation model, that won't matter.
It can be claimed, perhaps, that I'm creating a strawman argument: perhaps no one seriously believes that teacher performance as determined by test scores will be evaluated in such a simpleminded way. I'd be happy to be convinced.
(Note: As I am finishing this essay, a new post from Larry Cuban just arrived in my inbox, titled "Algorithms, Accountability, and Professional Judgement (Part 3)". I think it fits in well with this post, and if you've read this far I think you should read Dr. Cuban's post also.)
As a public school teacher beginning to think about upcoming benchmark assessments, I read this post with great interest. Your insights brought a few thoughts to mind:
ReplyDelete1) Implicit in the use of percentages to measure student/teacher achievement is the idea that 100% is an attainable goal. One should question whether 100% is even a meaningful concept in this context. Is it not rather absurd in curriculum development to list a finite set of objectives for a course and pretend that 100% "knowledge" of those objectives is possible? When 100%--whatever that means--has been attained, is that the end of the effort?
Did not Socrates teach us that true wisdom is the realization of how little we know? (which rings true with every experience in life) A modicum of honest modesty would go far curriculum planning/assessment, as well as in our approach to teaching. A teacher should inspire passion for learning, but learning that always points toward the infinite well of knowledge. How dull and tedious to lay out N objectives and plow through them, as if that were the end of the matter. Finite sets of objectives cannot inspire passion. Yet these benchmark assessments have that concept at their heart.
2) You characterize these innovations as "market" driven. I would adjust that term to "market-like". The businessmen-turned-educationists do have some sense for how good markets run. They devise a market-like model, but then use law to enforce it in the least market-driven sense imaginable. The use of "market" to describe exactly the opposite is a cruel joke.
3) The saddest part of these adjustments to teacher evaluations is the inevitable loss of the wise and inspiring teacher. When teacher evaluations are competitive and based primarily on objective tests (poorly designed tests, as you predict) an industrious teacher right out of college can "beat" the 25+ year veteran. As Banesh Hoffman predicted in the 60's with the rise of standardized testing, mediocrity will tend to rise to the top.
Veteran teachers have something important called "life experience" and should be unafraid to tap their wisdom to instruct children beyond their math or science or writing curriculum from time to time. Many veteran teachers develop a parent-like love for their students (often after having raised children of their own) and can treat their students with a deeper kindness, respect and sensitivity that only comes with many years of experience. Sadly, as the cult of objectivity gains ground, these un-testable skills are valued less and less by nearly all stakeholders.
Thank you for those thoughtful comments. Ultimately, the idea of a curriculum means that there is a certain body of knowledge that someone (the "powers that be") think is necessary for the young people in a culture to know. My sense is that we've gone through periods historically where the curriculum was completely dictated, and other periods where the curriculum was much more open-ended. We're in one of the former periods now, and benchmarks are just one symptom of that.
DeleteI'd also agree with your point #2, although it strikes me that very little in the relationship between corporations and the government and the public is truly "market"-driven in general; that is, it's not just in the educational world where businesspeople use law to create and protect their markets in whatever. We certainly do not live in a Randian capitalist "paradise" (well, except for the 1%'s paradise, anyway); it's much more like a oligarchy or plutocracy.
Your last point is one that I've discussed with several of my colleagues quite often. The model I foresee is as this process continues to evolve unabated, ever-increasing numbers of schools will be largely populated with wide-eyed TfA grads, who teach for a year or two before heading to greener pastures to do what they really want, and a small minority of teachers close enough to retirement to hang on out of inertia and cynicism. Sad, really.