School Scandals Reveal the Problem with Grading Schools
We measure school performance by test scores because it’s easy. But no simplistic set of A-F grades can ever account for all the intangible ways schools nurture their pupils.
The downfall of Tony Bennett as Florida’s education commissioner represents more than the humiliation of a once-admired school official. It marks a setback for the widely adopted school grading system for which Bennett was a national cheerleader.
Bennett’s misdeeds didn’t occur in Florida, but in Indiana, where he was commissioner in 2012. According to stories in the Associated Press, which Bennett did not refute, he wrote an email to one of his agency colleagues demanding a higher grade for a school run by one of his major campaign donors. “Anything less than an A for Christel House,” Bennett allegedly wrote, “compromises all of our accountability work.” The founder of Christel House, philanthropist Christel DeHaan, had donated $130,000 to Bennett’s campaign for re-election as state school superintendent. The grade was quietly changed from a C to an A. Ultimately, Bennett was defeated in his bid for re-election, and in 2012, he moved to Florida to take a similar position.
Bennett tried initially to ride out the storm over his Indiana email, but that strategy didn’t get him far. Three days after the AP story about the email broke, Bennett unceremoniously resigned his Florida position at the end of July.
What’s less well known is that Bennett’s Indiana-powered resignation wasn’t the only embarrassment that plagued Florida’s school grading system this summer. Early in July, the state school board announced that whatever this year’s Florida Comprehensive Assessment Test (FCAT) disclosed, no individual school would fall more than one grade in the A-F rankings. So if a school got a B in 2012, the worst it could do would be a C in 2013. And so forth.
Florida had made some changes to its annual performance exam in the past year, mostly in the writing category, and the results weren’t pretty. If the numbers had been taken at face value, more than 130 schools would have received a failing grade. Nine of those schools would have dropped all the way from A to F.
You might be asking how a few changes in a test could alter a school’s performance rating from outstanding to abysmal in a single year. Lots of educators in Florida have been asking the same question. Many of them argued, even before the email debacle, that the entire testing procedure was a mess and needed to be taken offline for major repairs. Kathleen Shanahan, one of the original architects of the A-F grading scheme, acknowledged that it was time to pull the plug, if only temporarily. “We’ve overcomplicated the model,” Shanahan said, “and I don’t think it’s statistically valid.”
The Tampa Bay Times newspaper lamented that “after grading schools for 15 years, Florida’s education leaders still cannot get it right.”
One might easily go further and argue that changing the results to make the picture look brighter, whether it involves outright cheating or not, is cause for embarrassment all by itself. If new test questions can have that much effect on a school’s overall performance grade, then why should anybody believe in the integrity of the system?
What’s especially humiliating is that Florida is the birthplace of the school testing movement, the state where former Gov. Jeb Bush decided in 1999 to begin awarding overall letter grades to individual schools to provide information for parents and help assess statewide educational performance. More than a dozen states have begun using a similar system since then, several of them just in the current year. Now they are being told that the Florida model they dutifully copied is too full of flaws to be trusted.
That matters a great deal because a lot more is riding on FCAT test scores than just local bragging rights. If a school receives repeated grades of D or F, it can be required by the state to take a variety of drastic measures, such as making the entire faculty reapply for their jobs, converting the school to a charter or closing it down altogether. So public confidence in the grading process is essential if the state is to have any credibility as a dispenser of draconian educational remedies.
States applying or adapting the Florida model have learned that changing the questions on the test, or switching to a new type of test altogether, can result in wildly fluctuating school grades. School officials in New Mexico this year were delighted to find out that the number of schools receiving A grades had more than doubled in comparison with the results from the year before. Was this the product of innovative new pedagogical techniques? Well, no. It was because the state had abandoned the federally designed No Child Left Behind test and switched to a new one designed by state education experts. Mississippi had a similar experience. Its school test scores went up dramatically because state officials took the expedient step of removing high school graduation rates from the list of test criteria for some schools.
The dramatically higher scores that resulted were a cause for initial state elation. But on further review, they raised another serious question. If the testing process is based on solid educational research, then the results from different tests ought to be reasonably congruent. If the results are dramatically disparate, there is a disturbing suggestion that the people writing the tests aren’t sure what it is they are supposed to be measuring.
Maine is another state that has endured a season of controversy based on the introduction of its new school grading procedures. Gov. Paul LePage, a tireless advocate of school measurement, pushed through a new system this year based largely on the Florida model. Schools were evaluated on student test scores in reading and math; the percentage of students who had shown improvement in their scores during the past year, especially among the bottom 25 percent; graduation rates among upper-level students; and percentage of students who take the national SAT exam.
When the statewide results were tallied, Maine’s schools averaged a C grade—a reasonable enough sounding score. But when researchers in the state began looking at the results in greater detail, they found something that disturbed them. What the tests were really tracking was demographics. Schools in poorer communities around the state nearly all finished lower than their counterparts in affluent suburbs, regardless of academic methods. High schools that were graded A had an average of 9 percent of their students on free or reduced price lunch. Schools that got an F had 61 percent of their students receiving subsidized lunches. To a great extent, the test was simply a measure of poverty, not school quality.
“We know that there is a relationship between poverty and lower test scores,” David Silvernail, director of the Center for Education Policy at the University of Southern Maine, told the Portland Press Herald. “It’s been well established. This grading system, unfortunately, just highlights it.”
As in Florida, a special measurement of the test progress of the lower 25 percent of the students was supposed to reduce the impact of demographic differences. But it was a token gesture. A school in a high-poverty area that has to deal with a high transient population and numerous non-English speakers is unlikely to score well on yearly improvement, no matter how hard it tries. It has too many insurmountable demographic factors working against it.
It is hard not to conclude in the end that the school testing movement represents a popular fad in educational policy that is desperately lacking in either substantive methodology or common sense. Its fundamental assumption, underneath all the jargon, is that schools fail because they just aren’t trying hard enough, not because they are being asked to educate pupils who are culturally and socially unprepared to learn. Cooking the books on the tests won’t do anything to solve this problem. All it will do, when the extent of the mischief is revealed, is undermine public confidence in the entire enterprise of school testing.
We have gotten into the business of measuring school performance with precise testing numbers because it’s something we know how to measure. In doing so, we leave aside the subtler and more personal things that teachers and principals do all the time to make their schools function in an orderly way and disseminate as much learning as they possibly can. In the words of Roger Jones, a professor at Lynchburg College in Virginia, one of the states that enacted an A-F grading system this year: “We have gotten so caught up in testing that we have lost sight of a true education.”
But the lessons of the A-F grading fiasco go beyond the assessments of school quality. They apply to the entire problem of measuring government performance. In the past couple of decades, federal, state and local officials have made significant strides in tracking the success of agencies they operate. When it comes to welfare, we now have reasonably good data on the real-life results that recipients encounter, not just a sterile account of the number of cases the agency staff sees in a given month.
But the success in performance measurement in one area leads to inevitable enthusiasm for its use in areas where it doesn’t provide meaningful results. School quality is one of those areas. Running a school is not an exact science. No simplistic set of A-F grades can ever account for all the intangible ways schools nurture their pupils, or fail to nurture them.
We measure school performance by test scores because it is easy to get the test scores. Only later do we bother discussing what the test scores mean. One of the positive results to come out of this year’s grading fiascoes is the possibility that we will now have that discussion at an earlier point in the process.