Cheaper storage and more powerful analytic tools are making big data an increasingly viable concept for the public sector. Some police departments, for example, already mine high-volume, high-variety, high-velocity data sources to support predictive policing. Public officials also hope to deploy big data in public health, education, corrections, infrastructure management, citizen engagement and many other arenas.
Extracting knowledge from data stores has become an essential function, said Rick Davis, CIO at the Virginia Department of Corrections (DOC). “We started accumulating all this data, and then people started saying, ‘We’d like to report from this,’ or ‘We want to do some trend analysis.’”
Still, governments that want to harness the power of big data face some serious obstacles. One such challenge emerges from the fact that governments collect and store data in so many different formats.
“You could have completely unrelated systems that might not be able to communicate or transmit data effectively between them,” said Bernard Soriano, deputy director of risk management at the California Department of Motor Vehicles (DMV). For instance, a project might need to extract some data points from a mainframe system and others from a Microsoft Excel spreadsheet.
And much of the information that governments want to tap doesn’t even reside in structured databases. Text documents, emails, photos, videos and posts on social media sites all offer rich material for analysis.
“If you want to integrate the data in the right way — maybe through an in-memory platform with real-time data — you need to make sure the data can be useful,” said Dante Ricci, a director at SAP, which provides technologies for extracting data from both structured and unstructured sources. “It’s a much more manual effort if you have to take data that’s not in standard format. That’s a big problem.”
SAP and other companies offer tools to help customers overcome that hurdle, said Ricci. Alternatively, governments could use application programming interfaces to make data from numerous sources available to users in standard formats, he said.
A related challenge stems from the fact that certain kinds of insights rely on data that’s owned by multiple organizations. Nancy Staisey, vice president of Smarter Cities North America at IBM, points to the value that the Memphis Police Department gained when it started integrating weather data into its crime analytics system. Among other things, the police learned that thieves steal more cars when it rains.
To access data that they don’t own, agencies might need to forge agreements with other agencies or jurisdictions, or buy data from commercial vendors. None of that is necessarily difficult, Staisey said. But leaders don’t always appreciate the value to be gained by looking beyond one’s own four walls. “This is an issue of culture and orientation — a willingness of people across an organization to combine and share their data, and being aware that there is data outside your organization that you may not own, that can be very helpful,” she said.
State Your Purpose
The need to ensure security and privacy poses challenges for all kinds of government IT initiatives. Big data projects raise even more issues because they often employ personal data that was gathered for an entirely different purpose, said the DMV’s Soriano. “I have collected it from you with the pretense of doing something — let’s say, to verify your existence within the Social Security Administration’s database,” he said. “If I use that information for something else — in some big data project — I’m not sure I have the authority to do that.”
To cover this kind of scenario, governments must establish new policies, he said. “But they need to be fully vetted, and they need to be transparent, so there is no sense of misuse or any type of illusion about using the data in some other fashion than it was intended.”
When a citizen gives data to an agency for one purpose — like to apply for a driver’s license — and the agency plans to use that data for other purposes, that agency must make some kind of disclosure, Soriano said. “That’s a problem for most governments. There’s a purpose and mission for an organization. The activities that the organization conducts need to be in line with that mission and purpose.”
The more data an agency collects and stores, the more careful it must be about who gains access to that data, said Davis. The Virginia Department of Corrections (DOC) keeps that principle in mind as it creates tools to let high-end business users pull reports from its data stores without help from the IT department. “You want to be able to track who’s doing what reporting,” he said.
The DOC has formed a governance group that meets regularly to determine who may access data and for what purposes. Those discussions cover physical access — as when someone makes a backup — as well as the right to create reports for various reasons, Davis said.
How Long Is Too Long?
Agencies that create massive data stores also grapple with how long to retain information.
“We just retired one of the mainframe systems that had data going back to the mid-’70s,” said Davis. Even when organizations purge their archives, they often forget about duplicate records stored in other locations. “Ultimately there is a cost to maintaining that data and storing it,” he said. “It’s the multiple copies of data all over that I think are going to get people into trouble.”
Agencies will need to make hard decisions about which data to keep, and for how long, to support useful analytics, Davis said. “There’s no reason that we have data from somebody who may not even be in this world now, from 40 years ago.”
In the days when governments kept most records on paper, the need for retention policies was obvious, Davis said. “You could completely fill warehouses with nothing but old documents. There was a huge expense to that.” It’s easier to ignore a glut of electronic information, because people don’t see that material, he said. “They just want more and more storage space.”
People argue that storage is cheap and getting cheaper, Davis said, but with terabytes of data all over the place, ultimately there’s a cost. And taxpayers bear it.
“I think we have to have really strong records retention policies that include electronic data,” said Davis. “And then organizations are going to have to invest in technologies that will allow you to strip data out of these relational database systems without having orphan records, and then truly purge it.”
Government policies on data security, privacy and retention often lag behind an organization’s technology needs because policy takes so long to change, said Ricci. But the slow pace of policy evolution isn’t entirely a bad thing. As stewards of public information and taxpayer dollars, governments must move cautiously. “It’s OK sometimes that the policy is behind the technology, because the public sector has to keep the citizen in mind first,” he said.
As governments wrestle with policy decisions, they may also struggle to find employees with the know-how to harvest insights from their data.
“I think there’s a dearth of qualified talent,” said Usha Mohan, CIO of Jacksonville, Fla. Mohan was a co-founder of IxReveal, a company that develops software for conducting big data projects.
The tools that one needs to extract business intelligence from complex collections of structured and unstructured data are not widely known, Mohan said. Even among business intelligence professionals, few understand how to work with text and video.
“We are projecting a very serious gap in the number of skilled people we have in the U.S. who are able to do advanced analytics in a big data world,” said IBM’s Staisey. According to the U.S. Department of Labor, over a five-year period the need for such individuals will grow by 20 percent, she said. There will be 120,000 to 190,000 fewer people available to do this kind of work than the nation requires.
“These are the people with the deep technical analytical skills,” Staisey said. “But they’re also projecting a gap of about 1.5 million in terms of managers and decision-makers who understand how to integrate that kind of information into their operations.”
Technology may offer a partial solution to the skills gap. “There are some tools on the market that do a fairly good job of bringing together at least some level of unstructured data with structured data,” Mohan said. Those tools make analytics easier for people who don’t have specific training in that area. “I think that’s the direction we need to go in if we really want to tackle big problems,” she said.
Along with people who can extract meaning from masses of data, governments need data analysts to apply quality controls, making sure their systems are correctly capturing the data they need to answer important questions, Davis said.
Where’s the Funding?
Of course, technology tools and talent cost money, and that’s a scarce commodity in government these days.
“I’ve been really pleased and blessed that my organization has given us resources to build out the data warehouses,” Davis said. “But those resources come from somewhere else in the organization. And those people aren’t always happy.”
Finding extra money to do anything is a challenge these days, said Mohan. One key to unlocking financial resources is to start by attacking small problems and solving them quickly. “It’s about showing success, building credibility,” she said. “The hard part is to show an ROI for analytics. But it’s very important to show it.”
One effective way to demonstrate the power of big data is simply to get started with a project focused on a single area, said Staisey. Good results make a persuasive case with decision-makers. “They see the value, and they want to do more.”
IBM has worked with governments — especially with cities — on big data projects that have demonstrated very powerful ROIs, Staisey said. One example is South Bend, Ind., which installed sensors in its water system to collect data that helps the city avert potentially hazardous overflows of wastewater.
Based on insights gained in its big data project, the city reduced the number of overflows from 27 to one per year, helping South Bend avoid penalties for wastewater overflows, as well as how to better store and distribute water, Staisey said. “They’ve avoided $120 million in planned infrastructure improvements.”
One piece of good financial news for governments is that because they’ve been slow to embrace big data, few of them have already invested in building data warehouses, said Mohan. So they’re not locked into using older technologies and can spend their dollars instead on the latest-generation systems, such as logical data warehouses. That will give governments an advantage, she said. “The newer processes and techniques provide more user empowerment.”