David McClure, Associate Administrator of the U.S. General Services Administration (GSA) Office of Citizen Services and Innovative Technologies (OCIST), sat down with the WashingtonExec to discuss the explosion of big data and how it relates to the federal government.
McClure also covered Data.gov, USA Search, and FedRAMP.
WashingtonExec: Do you have any big data pilot programs coming out this year or next year?
Dave McClure: One thing that we are running that really ties directly into some of the big data phenomenon is our USA Search capability. USA Search stores search logs in a Hadoop application that analyzes search strings and user behavior. The application can generate standard reports, such as most common queries for users, which can change daily. The application then feeds the results back into the search engine to generate improved type-ahead suggestions. The new search engine delivers 95% of its results in less than 650 milliseconds.
WashingtonExec: How do you think the federal government is doing in terms of seizing big data opportunities? Do you think it is neck and neck with the private sector?
Dave McClure: It’s a little bit of a mix. It’s hard to answer that quite honestly because the government’s role in big data is quite diverse. We have some of the largest data holdings in the world; Defense, Science, Health, Agriculture, Geospatial. From that perspective it rivals anything you would see in the private sector in terms of volume, size and perhaps even importance. Additionally, there are lots of agencies that run big programs that have lots of information that could be mined and examined, not only from a service delivery perspective and a management perspective but also from a fraud, waste and abuse perspective. Progress is being made in both areas.
WashingtonExec: What have you learned from launching Data.gov in terms of democratizing data? How has it impacted your process of upcoming programs?
Dave McClure: When Data.gov emerged a couple of years ago under the banner of open and transparent government it was revolutionary – putting huge amounts of machine readable data on-line for easy access and use. We have close to 500,000 data sets that are now loaded into Data.gov. We have to turn our attention away from the number of datasets and bulk downloads and uploads of datasets to a central site and instead focus on the use of information, the value proposition that it can create. We are beginning to see that lots of datasets are being used by third parties to create new applications and new services that provide value to citizens and businesses and help establish a value chain for the data collection. It is similar to a NOAA weather service model. All of the weather service entities, like the Weather Channel and Weather Bug all use data produced by NOAA it is the underlying source for all commercial weather data. As such, we’ve created a value proposition where people can mine the data, create and interesting application and use of it and either it’s free or they decide to sell it for whatever they think the market can bear. The government wins because the taxpayer dollars for the collection of that data are being used in high value ways that affect citizens’ life every day.
—————————————————————————————–
“We won’t be the steward housing all of the data from all of the agencies but we will help play a role in making sure that as things are discovered that they are communicated as widely as possible government-wide.”
——————————————————————————————
WashingtonExec: Do you think that big data is breaking down information silos?
Dave McClure: I think it does but it doesn’t automatically. Most big data is data that is being aggregated not from a single source; if it is aggregated from a single source we are finding that it has multiple uses across other entities. A good example of that is taking massive geo-spatial datasets and geocoding a lot of information like crime and health, like air pollution – so you are taking data from different areas, meshing it together and creating a completely new value proposition for information. You are presenting it through visualization and analytics that have a great deal of meaning to a diverse audience; not just a single audience. I think that’s the trend that’s helping break down some of the traditional approaches of function-by-function or topical area by topical area.
WashingtonExec: How are the top three buzzwords; cloud computing, mobility and big data related to you?
Dave McClure: I think cloud computing provides the raw processing, storage and analytical capabilities that big data is looking for. We used to have to run big data on mainframe super computers, and still do for a lot of really heavy duty modeling and analytics but you are now finding that cloud computing offers unbelievable compute power and accessibility at a fraction of the cost of what you used to pay. You can actually crank through huge volumes of data in cloud computing environments that take advantage of economies of scale and produce results in nanoseconds. It’s really revolutionized not only the storage but also the ability to compute and deliver results at increasingly fast time frames – it’s just unbelievable what can be done in relatively seconds on these large datasets. In-memory analytics and databases are revolutionizing the big data space.
Mobile computing is rapidly becoming the delivery channel. The smart devices and the mobile world are really changing the mechanism and it creates challenges for us because you don’t have the real estate and you don’t have the same capability on all smart phones that you would on a larger device. We don’t even have to think of it as just a delivery device; you can actually download data to a smart device and do additional analysis or use it in different ways with software that’s on that device as well. The convenient access and ease and use of data is just revolutionized both by cloud and mobile computing.
WashingtonExec: Have you all been using Big Data technology for FedRAMP?
Dave McClure: We have not. I think it will play a huge role in the future of FedRAMP and it will play a huge role and already is in the security area. One of the trends in the security area is to move toward continuous monitoring so you are collecting real-time data on the security status of a provider’s system or its infrastructure and you are wanting to know that on a real time basis, not monthly or quarterly or annually. That means that you are collecting incredibly sensitive but also very, very fast information on vulnerabilities, threats and actual conditions of an operating environment. You are going to be collecting so much data that the ability of a human to pour through that is very limited so where you are going to see smart algorithms and business logic trying to find trends, anomalies that would point to threats of vulnerabilities and get that to a human for action as fast as possible. That’s going to revolutionize the security side of the house and the FedRAMP office will be working with DHS and the agencies to make sure those kinds of capabilities are in place. We won’t be the steward housing all of the data from all of the agencies but we will help play a role in making sure that as things are discovered that they are communicated as widely as possible government-wide. It changes the entire posture of the security of the federal government to a real time basis, which will be very, very rewarding for us.
WashingtonExec: Do you think new big data capabilities are in pace with current cybersecurity standards?
Dave McClure: Big data analytical capabilities are expected to play a significant role in the expanding role of continuous monitoring, both for government buyers and government providers. Real-time data monitoring feeds designed to monitor the operational health of networks and systems and detect and respond to advanced persistent threats will necessitate the use of powerful, robust analytical engines. With the advancement of new learning oriented algorithms, responses and resolutions to vulnerabilities can be accomplished faster and more efficiently, though human interpretation and knowledge will continue to play a critical role.
WashingtonExec: What is something most people might not know about you?
Dave McClure: I’m a member of what I now call my 6×6 club. I run six miles before 6:00 every morning. I used to be a 5×5 but I can’t do that anymore. It’s a hard thing but I try to keep building at it.
WashingtonExec:What books have influenced or impacted your career?
Dave McClure: The Heart of Change by John Kotter. It really explains the difference between people who are technologists and people that are managers and executives and how you have to be able to transcend the failure to communicate between the two. It’s a very good book on trying to teach particularly technology-focused people, how to communicate with non-technology people in very convincing ways. The other book that I will point out to you which I was on a panel talking about was Little Bets which is a book about innovation and it’s trying to convince people to break away from the grand idea approach of big projects, big grandiose multi-month multi-year types of activities and instead break innovation into bite sized chunks and just deliver constantly. That’s what we sort of do in this office and it’s a good book for the times right now since everybody is focused on innovation.