What Exactly Is Big Data, Anyway?

By Kannan Sankaran
Kannan Sankaran
Kannan Sankaran
August 13, 2013 Updated: August 23, 2013

We live in a time where nearly everyone is participating in creating a very large amount of data everyday—from engaging with social media sites like Twitter and Facebook, to watching videos in multimedia sites like YouTube and Vimeo, to performing searches using Google and Bing. What happens to this data? Where does it get stored? How does it all become relevant? Understanding big data gives us the answer.

To understand big data, it is essential to first know about the three main Vs of data – Volume, Variety, and Velocity, as outlined by Doug Laney of Gartner Inc. in a report in February 2001:

1) Volume – It refers to the volume of data that is being stored everyday by companies, in data centers. According to StatisticBrain.com, a staggering 58 million tweets are sent out each day that is stored by Twitter.

2) Variety – This refers to the collection of data in various structured and unstructured formats like GPS sensors, images, videos, and blog posts. My personal Samsung Galaxy Note 2 phone has numerous sensors that are also contributing to this data.

3) Velocity – It refers to the speed at which data is analyzed by companies to provide a better user experience. If I am not able to get Google search results in a few seconds, I get impatient.

Since then, a lot more Vs including Veracity, Validity, and Volatility have been added by numerous companies as they started using big data technologies. 

Big data can thus be understood as the vast quantities of data collected by companies that is then processed to “gain insights into users’ spot business trends, prevent diseases, combat crime and so on.” If managed well, “the data can be used to unlock new sources of economic value, provide fresh insights into science and hold governments to account”, as stated by The Economist in “Data, data everywhere.”

Companies Using Big Data

The enormous potential in exploring big data for commercial use has propelled several industries to make their own case for using such technologies.

Apixio, a company based in San Mateo, California, uses big data to organize large volumes of patient records from various sources (Variety) and healthcare providers with a meaningful way to search for the information.

Knewton, an adaptive learning company based in New York City, partners with pioneering learning companies, publishers, content providers, and educational institutions, utilizing big data technologies to improve educational experiences for every single student.

Kapow Software, headquartered in Palo Alto, California, is a leading provider of smart process applications to companies that enable them to increase their responsiveness to customers (Velocity).

Companies like Netflix and Amazon use sophisticated big data algorithms to provide movie and book recommendations to their users.

Technology / Data Science for Big Data

The seed of large scale big data processing was sown in the technology world by search engine giant Google in 2004 when they published a research paper on an architectural framework called MapReduce that enabled large amounts of data to be split for processing in parallel by several machines called Mappers in the Map phase, and then combined together by several other machines called Reducers in the Reduce phase. The framework was so successful that an open source project named Hadoop was created in 2005 that is now one of the most popular software products for doing large-scale data-intensive tasks.

In 2006, Amazon then played a key role in providing remote computing services at low prices to external customers over the Internet (cloud) by creating Amazon Web Services, which has also become hugely successful.

As hardware began to get cheaper, and a large amount of free software was available, numerous startup companies tried their hand at providing big data services.

Big data processing and analysis builds on research in several fields including computer science, statistics, mathematics, data engineering, pattern recognition, visualization, artificial intelligence, machine learning, and high performance computing. The term data science is now being increasingly used to refer to analyzing data using a combination of the above fields. 

A new role in business called data scientist has emerged. The term was originally coined by DJ Patil and Jeff Hammerbacher, who built the first formal Data Science teams in Facebook and LinkedIn, and published a report about it in O’Reilly Radar.

Privacy Concerns Regarding Big Data

Given the advances in computing using big data technologies, it is also important to examine the privacy implications regarding data collection. The NSA PRISM surveillance program, revealed by the NSA leaker Edward Snowden and published by The Guardian and The Washington Post, utilized big data technologies to collect information about people including their emails, search history, and live chats. In a press conference on August 9, President Obama acknowledged that “the domestic spying has troubled Americans and hurt the country’s image abroad. But he called it a critical counterterrorism tool”.

Article Continues after the discussion. Vote and comment

[tok id=1cb1bbf1d3cd5e63046d41ffa3b3b86b partner=1966]


The Future of Big Data

Big data’s future seems to be bright, with a lot of impact in a lot of different areas of human living. In a recent study, McKinsey & Company, a leading management consulting firm, stated that the increased use of big-data analytics could, by 2020, “increase annual GDP in retailing and manufacturing by up to $325 billion and save as much as $285 billion in the cost of health care and government services”.

To prepare future big data practitioners, several universities throughout the US are beginning to offer graduate level programs and professional certificates in Data Science. MOOC platforms such as Coursera and EdX are already offering big data courses.

Rick Smolan, legendary photographer and author of the book “The Human Face of Big Data” says we are in the caveman era of big data and that it would influence every aspect of life on earth. Watch the THNKR interview “What Big Data Says About You”.

Was this article useful to you? Please write us your comments and let us know what you think.

Kannan Sankaran
Kannan Sankaran