Evolving Government: Leveraging a data catalog to understand your agency’s data

Op-Ed: Stephanie McReynolds of Alation explains how data catalogs allow organizations to easily find data, infer its context and track the path of data usage.

From digital records and information systems to new sources of data – like sensors embedded in parking meters, streetlights and yes, even trash cans – data in the public sector is increasing at a tremendous pace. Public sector organizations at every level are seeking to take advantage of all this data. But while the amount of data being created and captured has grown exponentially, the human capacity for parsing insights from that data has not kept up.

In the business world, Alation’s research shows that knowledge workers need between three weeks and two months to find, understand, and understand the data sources they need. This problem is even more apparent in government where data is siloed across agencies. Analysts are forced to spend most of their time tracking down the right dataset, searching endlessly through rows and columns and needlessly duplicating efforts. And, eventually, they lose trust in data. The challenge becomes even more daunting when considering machine learning and artificial intelligence initiatives that require data scientists to find machine-readable data formats.

Agencies need a foundation for self-service analytics where users can easily find and understand data, along with proper governance to ensure that data use is well-managed and adheres to both policy and best practices.

The Alation Data Catalog helps solve this problem with an automated data catalog that leverages machine learning to understand data usage. A data catalog allows organizations to easily find data, infer its context and track the path of data usage (otherwise known as “lineage” by data management experts). The data catalog creates a living inventory that continually updates with information on how data is being used, enabling users to know exactly what data they can access and whether it’s relevant to their needs – a critical first step toward open and efficient data use.


As an example of this in action, the City of San Diego has been named the nation’s top-performing data-driven city on the back of its award-winning projects, such as its StreetsSD street maintenance initiative. For the City of San Diego’s Chief Data Officer Maksim Pecherskiy, the first step to fulfilling the City’s commitment to open data was implementing a data catalog, providing an easy way to inventory and access all of the City’s data. The City’s data resides in multiple databases and includes complex formats, like geospatial data, and data from IoT devices, like smart parking meters. Using a data catalog helps Maksim ensure that all of the City’s 11,000 employees and 35 departments can leverage these diverse sets of data and systems from one single source, ultimately improving the democratic process through the open sharing of information with the public.

The data catalog has had a tangible impact on the lives of San Diego’s citizens. One advanced analytics project found the most optimal delivery routes for vehicles traveling between city facilities. Innovations like this reduce the cost of city services, minimize environmental impact, improve the health and welfare of city workers and enhance the lives of San Diego’s citizens.

A data catalog, of course, isn’t effective if not used. To be truly open and data-driven, each employee must take responsibility for his or her own data usage. That means raising the bar on data literacy and establishing programs and training that teach employees how to promote, share and use data in ways that adhere with policy and best practices.

Stephanie McReynolds is VP of Marketing for Alation — a data cataloging platform that automatically indexes your data by source and allows you to easily gather knowledge about your data, using machine learning to continually improve human understanding. In addition to the City of San Diego, Alation has helped several of the Fortune 500 better understand and leverage their data. Alation is now participating in the Dcode program, a government-focused tech accelerator, to help solve big data problems for the federal government.

Latest Podcasts