Bigger isn’t always better when it comes to extracting meaning from data.
For Richard Heimann, the chief data scientist at L-3 National Security Solutions, he struggles most in his discipline with simplifying the taxonomy of questions you could ask of data in order to articulate what exactly data science can solve.
The daily challenge revolves more around identifying what can be asked of diverse data than around admiring its mere size.
“People often hinge data science with big data. Big data needs data science, but I don’t think data science needs big data,” Heimann told WashingtonExec in a recent phone interview, and elaborated further on the importance of communicating the intrinsic value of data. “My job is to communicate down [the applicable value of data science]in a consumable manner…because once you begin to understand the intrinsic value of data, you can begin to take advantage of what data science can do for you.”
He’s tasked himself with changing cultural norms to shatter the conventions surrounding big data that assume that volume of data trumps quality of data.
“Our program managers and the government program managers understand how to manage traditional IT projects and software development projects, but data science projects tend to be more complex and consequently they require more education and a little more patience on our part,” Heimann said of his job. “The task of the data scientist is to arm them with the right way to think about analytical type of questions. Data Science is about allowing data to speak and communicate in novel ways.”
As a data scientist, Heimann’s job is to identify hidden patterns from sets of information for L-3 National Security Solutions – a unit formed in March after L-3 Communications acquired Data Tactics Corporation. There he focuses on big data, advanced spatial analytics and cloud computing to build his company’s data science team.
L-3 NSS delivers big data analytics and cloud computing solution services primarily to the Department of Defense.
Where Heimann’s work with the original Data Tactics focused primarily on research work with the Defense Advanced Research Projects Agency (DARPA), at L-3 he works on applying data science more broadly to problems within the Defense Department and intelligence community.
“The domain [since the acquisition]is a little different now that we are integrating more in the customer space of L-3,” Heimann said. “The scale of the problem — the size of the data effectively – is a little different, and I think those are exciting areas of exploration for the data science team.”
One example of his work? Heimann in March co-published a book on analyzing social media data that aims to understand groups and the ways in which information spreads.
In the book, Social Media Mining with R, he and his co-author Nathan Danneman detail through a series of case studies the methodology behind using and applying sentiment analysis tools to social media websites.
“We’re interested in group dynamics and topics and conversations,” Heimann said. “When done responsibly, social media mining and the examination of social media data, specifically can elucidate some non-trivial things about groups of people.”
He and Danneman mined text from platforms like Twitter and the Beige Book — the eight-times a year published Fed report and commentary on current economic conditions — using lexicon based approaches, Naive Bayes Classifiers, and Item Response Theory in search of patterns. They found that applying data science as a proxy for economic growth elucidates more about a country’s economy than merely tabulating the country’s GDP.
“What beige book and other crowd sourced data offer is inward asymptotics. In other words, we get denser data spatial and temporally with inward asymptotics,” Heimann said. “What you see is that [Beige Book text] was pretty close to GDP, but since they’re measured 8 times a year instead of once across 9 or 10 US cities you get a deaggregate examination across time and space, which is pretty cool. When we have data collected daily or monthly and at the county or block level we increase our variability and learn more. GDP/GPI fail in this important way.”
Heimann also teaches a class on Human Terrain Analysis at George Mason University aimed at increasing awareness of the spatially referenced social layer.
He and his team will be hosting a slew of events during the fall and summer to promote data science to the government. The workshops are titled “Data Science for Government.” Heimann and Danneman also have a one-day workshop with District Data Labs in the works where they will draw from the book to cover their work in social media mining.
You can follow Heimann on the Data Tactics blog here.