The (Hidden) Cost of Open Data
For all of its advantages, cost isn't always one of them. But there are ways to keep them down.
Los Angeles County announced this January the creation of an open data website that would allow anyone to find information on a host of county government programs, from budget information to welfare data to crime statistics. Supervisor Mark Ridley-Thomas told the Los Angeles Times the county was about to become the “largest municipal government in the nation” to make its data easily accessible to the public.
While the data will be free to the public, the county will spend $319,000 in startup costs, and annual expenses are expected to cost an additional $287,000. For comparison, consider this: California lawmakers in June introduced a bill to establish a statewide open data policy that would affect more than 200 state agencies. An analysis of the bill’s fiscal impact showed the policy would cost the state $4 million to $5 million annually.
Open data has become a big movement in state and local governments. One clear sign of its growing popularity is the rising fortunes of Socrata, a company that helps states and localities set up open data programs and websites. This spring, the firm announced that demand for its products has surged, generating 104 percent year-over-year growth in its customer base.
Adoption of open data polices is linked to two powerful benefits. First, it makes government more transparent and understandable at a time when trust in the public sector has plummeted. Second, it has the potential to generate significant economic benefits. The consulting firm McKinsey has estimated open data’s economic potential at more than $3 trillion globally.
But before the profits arrive, governments have to cover sometimes unexpected costs. All projects, of course, are affected by these. Arnaud Sahuguet, the chief technology officer of New York University’s GovLab, recently wrote a blog post in which he listed some of the factors that can hide the true cost of open data: unexpected startup costs if data is kept in a legacy computer system that requires reformatting; quality-related costs to keep open data fresh and up-to-date; legal costs to comply with open data legislation; liability costs in case something goes wrong, such as publication of nonpublic information; and public relations costs that can occur when a jurisdiction generates bad press from open data about poor performance metrics or workforce diversity problems.
For California, some hidden expenses have already surfaced. They include staffing a new chief data officer position and spending more than $756,000 over a three-year period to enable employees in the Department of Insurance to interact with the open data portal, inventory the department’s data and redact nonpublic information.
One very big way to hold down costs is to limit what data sets are published. Public officials tend to focus on the number of data sets their city or state releases rather than on the effect of releasing a few high-quality sets of data. Some jurisdictions have achieved acclaim for their transparency by publishing select data sets that impact budget issues, public safety and education. Others have focused on data that has high user participation rates and useful information, which can deliver economic value for startups and established businesses when they reuse data.
Cities and states that do a careful analysis of which data sets have the most impact, both in terms of transparency and economic value, are less likely to be burdened by hidden costs down the road. The selected release of quality data can also improve the efficiencies of government by lowering certain operating costs. The bottom line is that government data can be extremely valuable for public consumption, but only if the policies behind the data are well thought out and the related costs are affordable.