🌟 Dreaming of a bright future? 🎓 Ask about the Treehouse Scholarship program! 🚀

🤖 Level up your chatbot knowledge with our latest AI course.

Join our free community Discord server here!

Learn React with us!

Preview

Start a free Courses trial
to watch this video

Sign up for Treehouse

Characteristics of Big Data

2:30 with Craig Dennis and Jared Smith

How do you characterize Big Data?

Teacher's Notes
Questions?
Video Transcript
Downloads
Workspaces

Terms

petabyte -- 2^⁵⁰ bytes; 1024 terabytes, or a million gigabytes.

The 4V's of Big Data

Volume - the scale
Velocity - the speed
Veracity - the certainty
Variety - the diversity

Learn More

We know that there might be more than 4 Vs in the world of Big Data, but for those of you getting started in the Industry, these are the most relevant to remember

Related Discussions

Have questions about this video? Start a discussion with the community and Treehouse staff.

Sign up

Related Discussions

Have questions about this video? Start a discussion with the community and Treehouse staff.

Sign up

Big data spans a broad range of data sets 0:00

that are nearly impossible to use without specialized tools and systems. 0:03

Now, I'd like you to think of big data as not only the data sets that are be 0:06

processed, but also the infrastructure that's needed to support such analysis. 0:10

Now this spans from ingesting the data in the back end all the way to displaying 0:14

visualizations on the front end. 0:18

Big data plays a part in all of that. 0:19

Big data is typically characterized by what is known as the four V's. 0:22

That's volume, velocity, variety, and veracity. 0:26

Let's take a look at each one of those. 0:30

The size of the data helps to define whether it can 0:32

actually be considered big data. 0:35

You define the volume by the size of the data in gigabytes, 0:37

terabytes or even petabytes. 0:41

Now this varies widely across data sets, 0:43

but usually anything over one gigabyte is considered to be a large volume. 0:44

The velocity helps you define the challenges and 0:50

the demands that make growth and development difficult. 0:52

This is often defined by the problem space. 0:55

Now for example, one problem space you might encounter is that you're doing 0:57

search querying in real time, so you want to process data extremely quickly. 1:01

This is typically called streaming. 1:06

On the other hand, 1:08

if you want to process data once per day, that's called batch processing. 1:09

Now for instance, maybe you want to process 1:13

usage data from a whole suite of mobile applications at the end of each day. 1:16

You won't really be bothered by the latency of getting a response back. 1:20

And you can prepare the data to be ready when you need it. 1:24

Data can be very diverse, often containing both structured and unstructured sources. 1:27

Often, it's made up of many different types. 1:32

Some of it is dense or sparse and sometimes it's dependent on time and 1:35

sometimes it's not. 1:38

There are many other defining characteristics. 1:40

All of these different properties of data means that processing it 1:42

can be significantly harder due to the amount of work ahead of time that 1:46

has to be done to get it into the correct format. 1:49

The trustworthiness and validity of captured data is not always immediately 1:52

clear and therefore, can vary greatly, affecting accurate analysis. 1:57

This could lead to longer pre-processing times and 2:01

more specific requirements on how much data is necessary to make 2:04

effective decisions with the tools available. 2:08

So once again, those four V's are volume, the scale of data, velocity, the speed 2:11

of data, veracity, the certainty of data and variety, the diversity of data. 2:16

With these four characteristics in mind, let's explore why big data is so 2:22

important and why you should be aware of it, right after this quick break. 2:26

You need to sign up for Treehouse in order to download course files.

Sign up

You need to sign up for Treehouse in order to set up Workspace

Sign up