Except among IT professionals, the term seems to mostly be used as a pejorative. Something that evokes fear, or derision, or some force to be resisted. In common usage, Big Data ignites the same sense of dread that Big Petroleum, or Big Government, or Big Pharmaceutical do.
But what does it mean when used in the Information Technology lexicon? Is it something to watch hawkishly, or is it something that holds the promise that we could know more, make better decisions, and innovate more rapidly?
When we talk about Big Data in IT, we mean information that is available in either very large quantities, or that is presented very rapidly. Usually we are referring to data sets that are large enough that it takes more than one computer to read and analyze it. Or we might be talking about data collected at such a pace that even moving it to a common storage resource is challenging.
Another facet of Big Data is variety. Most of the time, we are looking at information that doesn't come in a nicely structured, well groomed format. Often it is messy, comes in a variety of forms, and it may even defy transformation into an orderly and structured form.
We can dread it if we wish, but there are some hopeful implications that carry the promise of problem solving on a societal scale.
In the US, data scientists have used information from the National Climate Data Center to build accurate models to help food growers plan for maximum crop yield. The ability of humans to produce food in quantities sufficient to feed everyone is diminishing, and the discovery of more efficient means is imperative.
It's not only large organizations and massive corporations that are collecting and deriving answers from large data sets. A retired journalist in Virginia has assembled a crime database that may provide insight into the identities of previously unidentified serial killers. This story in the New York Times profiles his efforts.
Not surprisingly, most of the profitable uses of Big Data center around marketing, both in the commercial and political marketplace. And there are certainly valid concerns about the implications of this activity.
But to brand a whole domain of IT innovation with the worries about potential abuse simply hides the many useful, even ground breaking possibilities that lie ahead.
A quick look at the giant datasets freely available to the public on Amazon S3 reveals riches like OpenStreetMap, (a free, editable map of the world, created and maintained by volunteers.) Common Crawl (A corpus of web crawl data composed of over 5 billion web pages), or GDELT (Over a quarter-billion records monitoring the world's news from nearly every country, updated daily).
Insight once available only to well-funded researchers and giant corporations could already be in the hands of strategists for universities, local governments, and hobbyists virtually everywhere. The emergence of Big Data and the tools to derive meaning from it represent a big step forward in the realm of harnessing the body of human knowledge. While there may be dark implications for this, the overall picture is largely bright.