Monday, November 27, 2017

Whaddya Mean, Big Data?

It's been going on for a while now. We hear the term "Big Data" used repeatedly and often in contexts that suggest various meanings.  To some people, it seems to suggest an insidious assault on personal privacy. To others, it seems to mean the collection of companies like Facebook and Google that thrive and use their success to influence public policy.

Except among IT professionals, the term seems to mostly be used as a pejorative. Something that evokes fear, or derision, or some force to be resisted. In common usage, Big Data ignites the same sense of dread that Big Petroleum, or Big Government, or Big Pharmaceutical do.

But what does it mean when used in the Information Technology lexicon? Is it something to watch hawkishly, or is it something that holds the promise that we could know more, make better decisions, and innovate more rapidly?

When we talk about Big Data in IT, we mean information that is available in either very large quantities, or that is presented very rapidly. Usually we are referring to data sets that are large enough that it takes more than one computer to read and analyze it. Or we might be talking about data collected at such a pace that even moving it to a common storage resource is challenging.

Another facet of Big Data is variety. Most of the time, we are looking at information that doesn't come in a nicely structured, well groomed format. Often it is messy, comes in a variety of forms, and it may even defy transformation into an orderly and structured form.

We can dread it if we wish, but there are some hopeful implications that carry the promise of problem solving on a societal scale.

In the US, data scientists have used information from the National Climate Data Center to build accurate models to help food growers plan for maximum crop yield. The ability of humans to produce food in quantities sufficient to feed everyone is diminishing, and the discovery of more efficient means is imperative.

The collection of telemetry from motorists all over the country is used to feed mapping and navigation systems that help drivers make intelligent decisions about avoiding congestion. The data that comes from this effort is sufficiently valuable that it's not only Google and Apple who are collecting it. Auto manufacturers like Mercedes Benz have installed systems in most of their newer vehicles to begin the effort as well. The data will yield insights that go far beyond the question of how to avoid the next traffic jam.

It's not only large organizations and massive corporations that are collecting and deriving answers from large data sets. A retired journalist in Virginia has assembled a crime database that may provide insight into the identities of previously unidentified serial killers. This story in the New York Times profiles his efforts.

Not surprisingly, most of the profitable uses of Big Data center around marketing, both in the commercial and political marketplace. And there are certainly valid concerns about the implications of this activity.

But to brand a whole domain of IT innovation with the worries about potential abuse simply hides the many useful, even ground breaking possibilities that lie ahead.

A quick look at the giant datasets freely available to the public on Amazon S3 reveals riches like OpenStreetMap, (a free, editable map of the world, created and maintained by volunteers.)  Common Crawl (A corpus of web crawl data composed of over 5 billion web pages), or GDELT (Over a quarter-billion records monitoring the world's news from nearly every country, updated daily).

Insight once available only to well-funded researchers and giant corporations could already be in the hands of strategists for universities, local governments, and hobbyists virtually everywhere.  The emergence of Big Data and the tools to derive meaning from it represent a big step forward in the realm of harnessing the body of human knowledge. While there may be dark implications for this, the overall picture is largely bright.