Jeffrey C. (Jeff) Yu is the Director of Technology and Engineering Civil Systems Division at Northrop Grumman Information Systems. Yu, a Princeton and Stanford University graduate, is also a founding board member of the Cync early-stage incubator program, an alliance between Northrop Grumman and the Cyber Incubator at the University of Maryland, Baltimore County (UMBC).
In this WashingtonExec interview, Yu gives insight as to what exactly “big data” is, why it is a fast-growing industry, what sectors are now demanding big data solutions and analytics.
Yu also touches on the intricate balance between mobile security and user experience.
WashingtonExec: The concept of big data has been around for a long time. Why do you think it has exploded now?
Jeff Yu: Big data has been around a long time, but scale and the pervasiveness of the applications are bringing big data to the fore now. We consider a set of data “big” when it exhibits scale in a combination of volume, variety, velocity and complexity. What we are seeing now is a convergence of data production from many sectors of the industry exploding and at the same time technology is becoming widely available for managing, accessing and deriving value from big data. Businesses can glean deep insights into customer buying behavior, improving business operations, reducing defects and failures and or detecting and preventing fraud, waste and abuse. They can do this in a manner that is timely and complete by using a combination of add-ons to business systems already deployed in their enterprise and specialized information management and analytics tools. In recent years, the available tools and people familiar with using those tools have exploded. For example, the open source Hadoop framework for distributed applications is a popular foundation toolset for big data.
“Fighting” the growth of increasing volumes of data is a losing battle. Harnessing the data we have and being able to access it with a robust information approach is critical.”
WashingtonExec: How is stream computing changing workflow or work-optimization?
Jeff Yu: Traditional approaches typically involve processing data at rest – searching for and retrieving stored data, performing some operations on it and then analyzing it. There is a lot of excitement around inline or “on-the-wire” analytics, where data is processed on-the-fly as it streams into a system or user application. This enables entire new classes of actions and applications. We can read, classify and analyze system performance and user metrics to optimize business or process flows dynamically. Advances in stream processing now provide the ability to perform analogous functions at high speeds with low latency on large volumes of complex data. Examples are detecting and stopping malware and other cyber threats in real-time or flagging and stopping fraudulent financial transactions. In each case we “close the loop” by using analytics tuned for the situation at hand. A new approach worth watching is in-memory analytics, which combines the ability to perform deep forensics analysis on historical data with the ability to react in near-real-time to analytics performed on streaming data.
WashingtonExec: Is the federal government a leader in big data or data mining, or is the commercial sector ahead?
Jeff Yu: In many ways the federal government was the original source and primary user of big data. For example, data from big science, weather, geospatial imagery, and intelligence. The commercial sector is a significant player in both producing and exploiting big data. Universities, often fueled by government grants, continue to push the frontiers through research and then transition them to industry. Not long ago technologies such as deep link analysis, semantic search and natural language processing were available only in government, university or industry labs. These are now widely available. A promising trend is that collaboration and synergy between government and industry are growing stronger and accelerating the pace of innovation.
WashingtonExec: How does “unstructured data” work? Why is there suddenly a need for unstructured data to be structured?
Jeff Yu: Traditional enterprise “data processing” deals with data structured into forms or established data exchange formats within a domain. Today, we have a wider body of information that is of interest and relevance within any enterprise and a growing amount of that information is contained in routine forms of communication among people, rather than specifically designed for structured machine processing. This includes email, web information, documents, reports, images, videos, social media information and plain text — information that is mission-critical to many enterprises. We typically characterize data that does not fit an established data model or fit into conventional relational databases as “unstructured data.” Recognition of the utility of unstructured data was slow in the beginning, but we are rapidly catching up in all business domains. We are not trying to transform unstructured data into structured data, but rather bring conceptual order and meaning to the data. Today, we have a rich set of tools to manage and derive meaning from unstructured data.
WashingtonExec: How is big data breaking down information silos?
Jeff Yu: One of the most powerful things big data analytics enables is the ability to discover new meanings and relationships by bringing together data from disparate sources and domains. In some cases, this is planned, such as geo-coding images or event data and overlaying them on a geospatial data set to create multi-layered maps — in many other cases, bringing together data results in discoveries. Retailers combine many sources of historical and real-time data to spot trends and predict behavior and even adjust pricing or product presentation on the fly. On a deeper level, analyzing the efficacy of potential new drugs or producing insightful national intelligence reports also relies on breaking down traditional information silos. As a result, people want to share the data they have to create a richer set of analytics.
WashingtonExec: Do we have too much data? Is this causing us to miss things?
Jeff Yu: There can never be too much data, just challenges to how we choose to manage and take advantage of the data to create useful information and make decisions. “Fighting” the growth of increasing volumes of data is a losing battle. Harnessing the data we have and being able to access it with a robust information approach is critical. The process starts by understanding what problems we want to solve, what decisions we want to make, what the relevant questions are and then matching our data and information discovery approaches with the tools and techniques to answer those questions. For example, matching “patterns of life” on social behaviors, demographics, economics and geography to crime and incident reports builds a much more complete public safety picture and improves community situational awareness.
WashingtonExec: Do you think it is harder or easier today to come up with a disruptive idea?
Jeff Yu: Today’s business environment is generally receptive to a disruptive idea if its potential value can be readily demonstrated. Disruptive ideas can be well-received if those ideas represent affordable innovation, and deliver performance enhancements, operational improvements, or cost savings. A disruptive idea does not have to have a visceral “wow factor.” The time to bring an idea from concept to practice can and should be very short. Enterprises that can create affordable innovation and an environment that encourages rapid experimentation and a clear path to validation and then adoption will be at a distinct competitive advantage.
“Big data creates opportunities across the board and one of the most compelling areas where it can make a significant impact is healthcare.”
WashingtonExec: What government agency or private industry sector do you think has the most to gain from Big Data?
Jeff Yu: Big data creates opportunities across the board and one of the most compelling areas where it can make a significant impact is healthcare. According to the Centers for Medicare & Medicaid Services, National Health Expenditure Data, over 17% of GDP or $2.6 trillion per year is spent on healthcare in the US and it is expected to grow at nearly 6% annually. The challenge is in making sense of the data available from electronic health records, health benefits systems, public health surveillance, population health, and clinical care and medical research. This presents a great opportunity to organize and manage the massive health data sets, provide secure and privacy-protected access to data, and run large-scale models and analytics to improve treatment decisions, while reducing unnecessary procedures and fraudulent claims. Another area is personalized medicine, where patient and condition-specific drugs and procedures can dramatically improve health outcomes and drive down costs. Advances in genomics and related “omics” fields are providing data on health conditions at scales never seen before. All of these are driven and enabled by big data — better data, better health.
WashingtonExec: What is the biggest difficulty you have experienced when managing mobile assets?
Jeff Yu: Much of the discussion on mobility is around BYOD – bring your own device. This involves employees using their personal mobile phones and tablets to access information including the enterprise network. The focus has been on access control, security and policies for monitoring and tracking. A more interesting focus is one on balancing security and the expected user experience. Mobile users today expect ready access to a wide variety of apps, on demand access to just about any type of information or media and connections to an extended social network. Employees will bring these expectations about how, when and where to interact with information in the workplace.
WashingtonExec: What is your favorite app or technology gadget?
Jeff Yu: On a personal level, I appreciate anything that can help me gain ready access to news and information. I am currently experimenting with news and media feeds and various aggregators. I am always on the lookout for emerging technology that can make a difference.