Skip to main content

Introduction to Data Science: What is Big Data?

What Is Big Data

First, we will discuss how big data is evaluated step by step process.

Evolution of Data

How the data evolved and how the big data came.
Nowadays the data have been evaluated from different sources like the evolution of technology, IoT(Internet of Things), Social media like Facebook, Instagram, Twitter, YouTube, many other sources the data has been created day by day.

1. Evolution of  Technology

We will see how technology is evolved as we see from the below image at the earlier stages we have the landline phone but now we have smartphones of Android, IoS, and HongMeng Os (Huawei)  that are making our life smarter as well as our phone smarter.
Apart from that, we have heavily built a desktop for processing of Mb's data that we were using a floppy you will remember how much data it can be stored after that hard disk has been introduced which can stored data in Tb. Now due to modern technology, we can be stored data in the cloud as well.
Similarly, nowadays we noticed that self-driving Car comes up. Now you must be thinking about why we are telling that you noticed the enhancement of the technology we are generating a lot of data. Let's see the example of your phones, Have you ever notices how much data is generated due to your fancy smartphones in your every action even one video is sent through any WhatsApp or any other  Messenger App that generate data. Now, this is just an example you have no idea how much data you generated because of every action you do. This data is not in the format that the Relational databases can handle and apart from that even the volume of the data has also increase exponentially.
Now we are talking about self-driving cars basically this car having sensors that record every minor detail like the size of the obstacle, the distance of the obstacle and many more then it decides how to respond. You can imagine how much data is generated for each kilometer drive on that car. Let's move on to the next evolution of the data.
Evolution of Technology. 

2. IoT

I think you people must hear about IOT if we recall the previous paragraph about the self-driving car it is nothing but its an example of IOT. Let me discuss what exactly it is. IOT connects the physical device with the internet and makes a device smarter. Nowadays we have noticed the smart AC, TV, etc, So we will take an example of Smart Air Conditioners this device monitor your body temperature and outside temperature accordingly maintain what should be the temperature of the room.
Now in order to do this first, it accumulates the data from where it can accumulate data from the internet through sensors that monitoring data from your body temperature and surrounding. Basically from various sources that might you know about is actually fetching the data and accordingly it decide what should be the temperature of your room. Now actually we see because of in IOT we are generating a huge amount of data. As we are seeing in the below image there are a lot of IoT devices in future 2020 there will be 50 billion IoT devices. We will not discuss there how IOT will generate such a huge amount of smart devices. Now we will move forward and discuss another factor that generates big data.
IOT

3. Social Media

Social media is one of the most important factors in the evolution of big data. Nowadays everyone using Facebook, Instagram, Youtube, Twitters and a lot of social media websites. As we see these social media websites have soo much data. e.g  If we have our personal details like our name, age apart from that with each picture we like, reacts and comments it also generates data. Even Facebook pages that we go around liking that also generates data. Nowadays we can see that a lot of people sharing videos on Facebook so that is generating a huge amount of data. The most challenges part is here that the data is not presenting in structure mannered and same time it is huge in size. As we see that not only data is generated in huge amount but it also generated in a different format. e.g Data generated with videos that are actually in an unstructured format the same goes for images, So there are numerous means million of ways that data are generated nowadays that are conveying to big data. 

4. Other Factors

All of us must visit websites like Amazon, Flipkart, etc. Suppose we want to buy a t-shirt or jeans so we search for a lot of t-shirts or jeans somewhere our search history will be stored. If we buy for the first time so there will be our purchase history as well along with personal details and there is numerous way in which didn't know that we generating data and also Amazon is not present earlier. So that time there is no way such a huge amount of data was generated. Similarly, data is evolving due to some other reason as well like Banking & Finance, Media & Entertainment, Healthcare, and Transportation, etc.
So now the main point as what exactly the big data is, how we consider the data as big data.
Other Factors

What is Big Data 

Now look at the proper definition of big data "is the term for the collection of large and complex data sets that it becomes difficult to process using on-hand database system tools or traditional database applications".
What we understand from this that our traditional system or our old system can process our data?
No, there is too much data to process. When the traditional system was invented at the beginning we never decapitated that we have to deal with such numerous amount of data.
How do we consider some data as big data or how do we consider to classify data as big data? So we have 5 V's of big data.
Big Data

5 V's of Big Data

If we can see some people write about 3 V's and some people write that there are 3 V's but here we will discuss the 5 V's. So look it the below discussion to understand how the data become big data due to these five characteristics


1. Volume

The first V of the big data is the volume of the data which tremendously large. So if we look at the diagram the volume of the data is increasing exponentially. We were dealing with 4.4 zettabytes of data in 2017 it will increase up to 44 zettabytes in 2020 which is equal to 44 trillion gigabytes. So that is really huge data.
Volume

2. Variety

All the humongous data coming from multiple sources that is the second V's variety. We deal with different kind of files that is all in once mp3 files, videos, Jason, CSV, TSV and many more. Now if we look at these data that are Structure, Un-Structured and Semi-Structured all together. Let us explain from the below diagram. We have Audio file, Video, Png, JSON, Log file, emails various format of data.  Now, this data is classified into three forms.

I. Structured Format

In Structured format, we have a proper scheme of our data we will know what are column would be there and basically, we know about the scheme of our data, so it is in structured format means in tabular form.

II. Semi-Structured Format

The second is the Semi-Structured format, So we can see from the diagram it is nothing but JSON, XML, CS V, TS V, and email where is scheme is not defined properly.

III. UN-Structured Format

In UN-Structured form, We have Log file, Audio file, video file, and all type images file consider in the UN-Structured format.
.
Variety

3. Velocity

It is also because of the speed of accumulation of this variety of data altogether which brings us to our third V's is called velocity. Let us explain from the diagram we were using mainframe computer system huge computer but having less data because there were fewer people were working with the computer at that time. As the computer evolve to become the client-server model and the time came for the web application and the internet boots. As day by day, the web application increase on the internet and now everyone is using these applications from the computer as well as from their mobile devices. More user more appliances, more apps, and more mobile devices enhance a lot of data.
When we talk about people to generate data our first thing coming in our mind is social media. If you think that how much data is generating by an Instagram alone on your post and stories.
We will talk about every social media application. If you see the below diagram for every 60 seconds social media apps generate, Twitters generate about 100 hundred Tweets in every minute, on Facebook 695,000 status update,  11 million Instagrams messages, 698,445 Google searches, 168 million emails sent in every one minute,  which is almost equal to 1,820 Terabytes of data generated, also mobile users are increasing in every minute. There are 217 new mobile users are added in every minute.  So that is a lot of data to calculate, to arrange in a proper manner so it becomes big data.
Velocity

4.Value

Now the bigger problem is here to extract useful data. So due to this reason, we come to the next V's that is Value. First, we need to mine useful content from our data basically we make sure that we have some useful field in our dataset and after that, we perform some certain analytics on that data we have to clean it.  after analysis on the dataset, it has some value that is it will help us in business to grow that can be found inside which is possible earlier. Whatever the big data or data has been generated it makes sense it will help us to grow our business and have some value.
Value

5. Veracity

Now getting the value from that data is a big challenge that brings us to the next V's is Veracity.
So that big data has a lot of uncertainty and inconsistencies. When we are dumping such a huge amount of data some of the data package bound to a loss in processing. So we need to do that to fill up these missing data then start mining again then processes it and then come up with good inside possible. If we look at the below diagram some of the data is missing, some of is minimum value and some of the data have a large value.

Veracity
We have a lot of problem in big data and a lot of opportunities that we will discuss in the next article.














Comments

  1. Wow..! This is amazing. The Internet of Things (IoT) is fast making its presence felt in all walks of life—right from an operation theater to a home theater set-up. But with all the worry about security and power issues, no one wants their devices to be connected to everything 24/7. This story on innovation is about one such device that comes to life only when you trigger it. It wakes up from slumber, connects to the Internet, does the job assigned to it and lets you know with its multi-color light emitting diode about the outcome. Once done, it goes back to rest. How do you trigger it? Just by pressing one button!

    ReplyDelete
  2. The explored issue seems to be deeply understood by the person who prepared this article, so I do hope that everyone would agree with me. Keep it up!

    ReplyDelete
  3. The constant development of technology opens up unprecedented horizons before us and requires skills that will help us analyze and work with such a huge amount of data.

    ReplyDelete
  4. An extraordinary information researcher must be well-educated in coding and insights with abilities in perception and narrating. Data Analytics Course

    ReplyDelete
  5. digital marketing24 February 2020 at 15:30




    360digiTMG is best training institute in machine learning course in hyderabad. Offers hands-on practical experience on live Machine Learning based projects and in depth-understanding of Machine Learning along with 100% assistance.
    The machine learning course hyderabad you will be trained under the best industry experts. With a record of over 10, 000 students trained from the 360digiTMG machine learning malasia. Popular machine learning modules that you need to learn for batter job opportunities.
    This course equips the student with a strong foundation in Python, R, and R Studio. Specifically, the use of R studio to develop statistical software is highlighted. The student then develops algorithms for skewness and kurtosis, box plot, hypothesis testing (parametric and non-parametric test), correlation analysis, linear regression, multiple linear regression, logistic regression, multiple logistic regression, supervised machine learning, KNN, Naive Bayes, Decision Tree, Random Forest, ANN, and SVM. Enabling Unsupervised learning and Reinforcement Learning with Python and R is also dealt with. Students are trained to develop compelling data visualizations using Python and R. This is the most comprehensive course on Machine Learning with Python and R. machine learning course hyderabad

    ReplyDelete
  6. You might comment on the order system of the blog. You should chat it's splendid. Your blog audit would swell up your visitors. I was very pleased to find this site.I wanted to thank you for this great read!!
    360Digitmg pmp training in hyderabad

    ReplyDelete

Post a Comment