Four years ago, the largest school cheating scandal in U.S. history shook Atlanta. Nearly 180 teachers and administrators at 44 of the city’s 56 public schools were implicated in a scheme to correct wrong answers given by students on standardized state tests, which were being used to evaluate each school’s performance under the federal education law known as No Child Left Behind. Some of the details seemed beyond belief, including parties at a principal’s house where school staff got together to change incorrect responses that students had penciled onto the test sheets.
The scandal upended what had appeared to be a decade of progress in which Atlanta schools made greater improvement in student test scores than any of their major metropolitan peers. Superintendent Beverly Hall had been recognized as the 2009 Superintendent of the Year by the American Association of School Administrators, eight months before The Atlanta Journal-Constitution published its first story calling the school district’s test-score gains into question.
This March, Hall was indicted on multiple counts of conspiracy, including making false statements and theft on account of the nearly $600,000 in bonuses for student achievement she had accrued during her decade-long tenure. She faces up to 45 years in prison if convicted. She has pled not guilty.
Worst of all, Atlanta isn’t alone. The El Paso, Texas, school board was stripped of its authority last December after its members failed to detect a scheme led by the superintendent to prevent low-performing students from coming to school on test days in order to meet the proficiency requirements under No Child Left Behind. School staff in Columbus, Ohio, are being investigated for manipulating student attendance data to remove students with poor test scores from their rolls. And a few months ago, school officials in the District of Columbia announced they had detected cheating at 11 schools in the last year and presented evidence that organized cheating could have been as widespread as it had been in Atlanta.
Now the public education system is preparing for a fundamental overhaul that will make the policing of student tests even more critical. At the end of the 2014-2015 school year, most states will administer exams based on the Common Core State Standards, new national academic criteria that have been adopted by more than 40 states since they were introduced in 2009.
New tests paired with the introduction of new teacher evaluation systems in many states raise a lot of questions. Can the new tests be cheat-proofed? Should states wait before they use the new tests to assess teachers? Should policymakers take this opportunity to rethink the whole issue of teacher evaluations? A great deal depends on how those questions are answered, and whether cheat-proof tests turn out to be feasible. If they are, the scandals of the last few years may turn out to be a historical anomaly, unfortunate growing pains as America learned how best to judge what happens in the classroom. But if more Atlanta-style misconduct emerges, the entire new system will stand to be discredited.
There is one obvious explanation for this proliferation of cheating: the high stakes of the testing, specifically the use of student test scores to evaluate teacher and school performance. That practice became institutionalized with No Child Left Behind’s Adequate Yearly Progress system, which demanded that schools make advances in student achievement or face financial penalties. It has continued under the Obama administration, which has required states to set statewide teacher evaluation standards that include student performance in order to receive Race to the Top grant funding or, somewhat ironically, a waiver from No Child Left Behind’s requirements.
Teachers in Atlanta blamed the pressure to meet high performance standards for the programmatic cheating there. While some quibble with exactly how much high-stakes testing can be blamed for such misconduct, most agree that it has played a key role in the scandals popping up across the country.
“This is what happens when you put so much stress and so much value on one thing,” says Adriane Dorrington, senior policy analyst at the National Education Association, the country’s largest teachers union. “If teachers feel like their lives are on the line, they’ll do whatever they have to do to make it work. The system has created this.”
No Child Left Behind commenced the era of high-stakes testing, but a 2009 report called “The Widget Effect,” authored by a nonprofit known as The New Teacher Project, forever married that issue to the question of teacher evaluations. The paper documented the worthlessness of the then-popular model for evaluating teachers, a model that was largely composed of one-off observations and was devoid of any metric for student performance. Nearly every teacher—99 percent of them—received positive evaluations under the system. But as the United States continued to fall behind its international peers in education attainment, it was increasingly clear that the system had to change.
The most obvious missing piece was some measure of how a teacher was affecting his or her students’ learning. No Child Left Behind had conveniently introduced annual and standardized tests that were already being used to assess a school’s performance. Using the same tests to monitor a teacher’s work was a logical next step. That approach became federal policy when the Obama administration tied billions of dollars in funding and crucial waivers from the unpopular No Child Left Behind law to a state’s willingness to evaluate teachers according to how their students perform. As a result, nearly 40 states have committed to the practice in the last four years.
It’s been an unruly process, which arguably came to a head this spring when seven teachers sued the state of Florida because their evaluations were based in part on the test scores of students they had never taught. The teachers argue that the policy violates their constitutional equal protection rights because they could theoretically be passed over for a raise or even laid off based on those assessments. Teachers unions in other states are watching the litigation, ready to press forward if the courts show sympathy for the teachers’ arguments.
Meanwhile, in an attempt to address some of those concerns, pro-reform groups such as the Gates Foundation have been looking for ways to blend tangible metrics like test scores with the less easily measured art of teaching. Long-term observations and student satisfaction surveys are popular concepts. But no single model has emerged. “These systems have been broken for decades, and they will take time to fix,” says Tim Daly, president of The New Teacher Project. “They will have to evolve as we learn more.”
The scandals in Atlanta and elsewhere have undercut this advancement toward a more empirical teacher evaluation. “Where we’ve seen the real problems have been in places where they’ve placed huge emphasis on the test scores and where the system has not built up a culture of trust,” says Doug Staiger, an economics professor at Dartmouth College who focuses on education. “We have to emphasize that it’s not just about achieving targets on test scores, but about more effective teaching and helping the kids.”
All of this explains why the new tests to be offered for Common Core next year are seen by some as a reset button. The American Federation of Teachers, the nation’s second-largest teachers union, has called for a moratorium on the use of test scores to evaluate teachers (and to make personnel decisions) while the new assessments are being implemented. Even the U.S. Department of Education, a staunch advocate of measuring teachers based on student performance, has issued initial guidance that would effectively allow states to apply for waivers from teacher evaluation commitments they made earlier. But other reformers have balked at the idea of hitting the pause button on the recent movement toward results-based evaluations.
South Carolina is one place where this drama is playing out. Like nearly 40 other states, it received a No Child Left Behind waiver and in turn agreed to set statewide standards for evaluating teachers in part with annual test scores. Pilot programs for the new evaluations were launched in 22 schools during the 2012-2013 school year, and will expand to 50 schools this year before going statewide in 2014-2015—just at the time the state is implementing the new Common Core tests. The South Carolina Education Association, the state’s main teachers union, is already decrying the idea of evaluating teachers based on how their students perform on tests that have not been administered before.
“Teachers in schools that did not have a pilot year will be going into this in the dark,” says Jackie Hicks, the union’s president. “Evaluators will expect competency for something they have never done.”
South Carolina officials are undeterred, reflecting a broader belief within the reform community that new assessments aren’t an excuse to take a break from accountability. But they have worked in a few provisions aimed at making the transition fairer for teachers—and, they hope, discouraging any scandals on the scale of Atlanta’s in a state that thus far has only turned up a few isolated incidents of cheating.
The new evaluations, for instance, can’t be used in South Carolina to make personnel decisions—such as not renewing a teacher’s contract—until the teacher has recorded two years of failing to meet expectations. This means that no teacher could be fired over the Common Core assessments until after the 2015-2016 school year, which state officials argue will allow teachers to make the full transition to the new tests.
“We’ve had new tests before. We’ve had new standards before, and there have not been pauses in accountability,” says Jay Ragley, deputy superintendent of the South Carolina Department of Education. “We see this as no different.”
As the teachers unions and education reformers battle over what the new tests mean for teacher evaluations, the test-makers themselves—the Smarter Balanced Assessment Consortium and the Partnership for Assessment of Readiness for College and Careers, each of which counts 20 or so states as members—have worked to prevent their tests from ever being exploited like the paper-and-pencil exams that have caused trouble in the past.
For starters, both are computer-based, eliminating the possibility of erasure parties in the principal’s living room. Smarter Balanced is also an adaptive assessment, meaning the difficulty and order of questions changes depending on a student’s answer to previous questions. In other words, no two students take the exact same test, and proponents believe this all but removes the opportunity for cheating. The new tests’ anti-cheating effectiveness will be debated during the first years of implementation, but there is a broad hope that the tests will significantly limit the likelihood of fraud.
If these new assessments can fulfill that promise, then the conversation can turn more fully to the question of what constitutes a fair teacher evaluation. Guidelines offered by the Gates Foundation’s Measures of Effective Teaching project—using multiple evaluation measures including classroom observations and student satisfaction surveys alongside an assessment of student learning—have laid the foundation for that debate.
There is also an ongoing conversation about how much flexibility states should give school districts in assessing their teachers. But the point is this: Everybody, from union leaders to hard-line reformers, agrees that there is no going back to the days before “The Widget Effect.” These next few years will be the learning period for the next generation of teacher evaluations, as their designers experiment with the right mixture of metrics.
“As this starts being used to improve schools, the debate won’t be about whether we should use test scores, it will be about whether there are better ways we can use them,” says Dartmouth’s Staiger. “People are going to realize that there’s no perfect measure. But if we put more of them together, we’ll start to get a better picture.”
While that national debate continues, school officials in Atlanta and other cities must work to overcome the stigma of their cheating scandals and move on. Atlanta officials have crafted a multipronged plan to prevent a repeat of the 2000s meltdown. A tight security chain will track tests from their delivery at the school building to the examination itself, with a requirement that administrators sign legally binding documents and a limit on the number of personnel who have access to the tests before and after they’ve been given. Atlanta’s schools are developing a plan for how those precautions will translate to the new Common Core assessments.
“My effort from here will be to continue to drive this culture change throughout the organization,” Erroll Davis, who was appointed to replace Hall as superintendent, wrote in a letter to parents last year. “Only when that happens will we be able to finally move beyond this scandal with a minimal risk of reoccurrence.”
So the city’s rehabilitation is underway. The national conversation is changing. In a few years, the scandal in Atlanta might be a distant memory. At least that’s the hope.