Return to site

Explore Open Data — Creating Data-driven Governments

This is the second part of our interview with Dr. Ali Farahani. Dr. Farahani is currently the Director of the Information Systems Advisory Board (ISAB), and he was also the first Chief Data Officer in Los Angeles County. You may review the first part of the interview here.

 Question: What are open data standards? 

Dr. Farahani: There are various degrees of open data standards for different data components. Financial data, for example, already has structured business and data type standards. There is the FASB standard for the private sector and the GASB standard for government. Government open data publishing can follow these standards which facilitate any type of comparative (cross sectional) or time series studies and analyses. 

For the broader model, there are two types of standards: the business standard, which has to do with the semantic, meaning, and relationships of the data, and the technical standard — whether I can make my data available in XML, CSV, or JSON, so they becomes machine readable.

When it comes to the development of business standards for open data, there are regional efforts, but there is no systematic national effort to standardize the business context for open data. 

The challenge is with the business standard. In the case of financial data, there are business standards for organizations to follow. For example, with FASB, asset is asset, no matter if the asset is in Los Angeles, New York, or San Francisco. We won’t mix assets with liabilities. But the same business standard does not cascade to other types of data. At this point, when it comes to the development of business standards for open data, there are regional efforts, but there is no systematic national effort to standardize the business context for open data. 

Question: How can we break the regional silos and national standards for open data? 

Dr. Farahani: We have to find a common ground for everyone to start. It could be OASIS, an international organization for the advancement of structured and standardized data. OASIS (Organization for the Advancement of Structured Information Standards) has two advantages: first, it is international; second, it is already an existing infrastructure, so we don’t have to start from scratch. Their standard setting process usually involves large IT companies, such as Microsoft, Google, IBM, SAP, and Oracle. 

Question: Linked data is a popular concept in open data. What is linked data and how significant is linked data in delivering open data values? 

Dr. Farahani: Linked data is definitely an area where tremendous business value can be created. Linked Data as a technology is about using the Web to establish machine-readable meaning and relationship. There is also a short-term solution with tremendous value. Right now, a lot of datasets are in silos: this dataset covers X, that dataset covers Y, and another dataset covers Z. You have to link them together to extract any meaningful value. To achieve this, the data publishing entity must provide the linkages when they publish the data. 

Let’s say that I want to do an analysis on homelessness and property values. If I have the property data by zip code and the homelessness data without zip code, then I cannot link them together. We don’t have to link every record, but we have to facilitate enough data dimensions so that data consumers can make necessary linkages. Linked data is becoming a requirement for any meaningful analysis, whether it is about crime, health, or jobs. 

Question: What do you think the ideal world for open data looks like?

First and foremost, there has to be a formal commitment by government agencies or a mandate for data publishing. 

Dr. Farahani: First and foremost, there has to be a formal commitment by government agencies or a mandate for data publishing. Ideally, it should be like FASB, as it requires companies to publish quarterly or annual financial reports. Second, the mandate has to state that any data that is not personal and would violate confidentiality laws should be published. If the government was publishing a large scope of data, then people can do great things with that data. Twitter data is a proof, as it already creates great value in business and research.

Question: What are the top three challenges that prevent us from reaching the ideal world of open data? 

Dr. Farahani: The first challenge is the cost of open data. We should accept the fact that publishing open data is not free, and it should be considered the necessary cost of doing business. 

The second challenge is culture. There are still cultural resistances by some government agencies. They are not ready to adopt open data or make organizational changes to be data-driven. 

Third, there are still technical challenges. Publishing data would be easy if data is stored in an automated system. But in many cases, the publishing agencies have to prepare and clean the dataset, put it into the right format, and then publish it. It is a tedious and inefficient process. The future government IT systems should be able to streamline the open data publishing process. 

Question: In your personal experience, what is the number one benefit that has already been realized by open data?

The exciting benefit of open data is in business innovations. Entrepreneurs are beginning to look at what data are available to work with. 

Dr. Farahani: I could think of two examples. One is that there are a number of organizations that have consumed open data since day one. For example, Zillow uses the property parcel data to perform trend analysis; Yelp uses the restaurant inspection data; others also use crime data for research purposes. 

The more exciting benefit of open data is in business innovations. Entrepreneurs are beginning to look at what data are available to work with. In Hack4LA, people build business and social solutions around open data. It is interesting to see how entrepreneurs and businesses can tap into this vast resource pool. I would also encourage local universities to develop classes on the applications of open data. 

Question: Can you share a story that best embodies the value of open data? 

Dr. Farahani: Of all the data sets that we publish, the one that gets the most attention is the employee salary data. LA County has over 100,000 employees, and almost 40% of our budget goes to labor costs. The salary dataset continues to be the number one hit dataset and the question is why. I think part of it is the fact that it is a big cost item for the county. Taxpayers always want to know how much the county pays in salaries. 

People also get insights from the descriptive statistics of the salary data. They can use the data to answer: Who are the top paid employees in LA County? What are the jobs that they have? What job classification gets the highest percentage of the salaries? It proves that once we make the data available, it leads to new opportunities and new areas to explore.a