Heads up! To view this whole video, sign in with your Courses account or enroll in your free 7-day trial. Sign In Enroll
Preview
Start a free Courses trial
to watch this video
When you start dealing with large datasets, the first question you should ask is “What kind of data are we dealing with?”
Terms
- GIS -- Geographic Information Systems
- Relational Database -- A database that stores data according to a predefined schema.
- SQL -- Structured Query Language
- NoSQL -- An overloaded term for non-relational databases
Learn More
Related Discussions
Have questions about this video? Start a discussion with the community and Treehouse staff.
Sign upRelated Discussions
Have questions about this video? Start a discussion with the community and Treehouse staff.
Sign up
When you start dealing
with large datasets,
0:00
the first question you should ask is,
what kind of data are we dealing with?
0:02
Is it structured text data?
0:06
Or is unstructured?
0:08
Is data missing?
0:09
Is it not text?
0:11
Is it video?
0:11
Or is it audio?
0:12
Does it have a location, or
geo-spacial information tied to it?
0:13
Are there a lot of default
values in the data?
0:17
The list goes on and on.
0:19
To really make sense out of the data that
you have on hand, and to begin to solve
0:21
the business problems with it, the first
step is to recognize the type of data
0:25
that you have and to put it into the
appropriate systems for storing that data.
0:30
For instance, if your data is something
like a group of customers and
0:35
their buying habits, it will probably
fit best in a relational SQL database or
0:38
a document-based NoSQL database.
0:42
If your data is a social network in
structure, it's dealing with how
0:45
things are interconnected,
you probably want to use a graph database.
0:48
if you have thousands
of videos to process,
0:53
you need to know that before you try
to store it in systems not equipped for
0:55
the high levels of band width needed
to transfer that data in and out.
0:59
Let's explore the major types
of data that you may encounter.
1:04
Structured data is data which is
formatted in a specific structure.
1:07
This means we can often separate
the data into fields that we can access.
1:13
Structured data makes the most sense for
1:18
a relational database where
the structure won't change very often.
1:20
Some common examples of structured
data are application logs,
1:23
customer information, and financial data.
1:27
Now conversely to our structured data is,
1:30
this is data which cannot easily fit
into one or more defined labels.
1:34
For instance, when Twitter analyzes
tweets for malicious content,
1:38
they can never be certain of exactly
what the data represents in the tweets.
1:41
Words and content can mean very different
things, and because there is so
1:45
much data to analyze, the unstructured
nature makes processing it even harder.
1:49
Common examples of unstructured data
include social media posts, books, and
1:54
healthcare data.
1:58
One of the most common unstructured forms
of data actually deserves its own type.
2:00
This kind of data requires high amounts
of bandwidth to process and store.
2:05
Compression is almost always necessary.
2:09
You can think of examples from
Netflix to social media posts.
2:12
Video conferencing apps also produce and
consume this type of data.
2:15
A common request is to store
additional location data or
2:19
metadata alongside other information.
2:22
When this happens,
a world of possibilities opens up.
2:25
Now it's possible to analyze everything
from simple location tracking
2:29
to how some object interacts with
the product in different places overtime.
2:33
A common example of this is any
app that tracks your location,
2:38
any sort of mapping or
direction app like Waze or Google Maps.
2:42
It could also be used for analyzing
fleets of trucks for a company, or
2:46
even for military.
2:50
Think drones of tracking
soldiers on the ground.
2:51
The Internet of Things, or IoT for
short, has created connected devices and
2:54
sensors all over the world.
2:58
These range from weather sensors for
3:00
alerting approaching tornadoes,
all the way to fitness apps.
3:01
The accelerometer of every person with a
modern smartphone can now record activity
3:05
information and send it back to a plethora
of apps for optimizing your workouts.
3:10
Sensors can transmit all
kinds of types of data,
3:15
from structured texts to video and audio.
3:18
In addition to those examples,
some other use cases are vehicle
3:21
communication systems, smart home
devices and traffic monitoring systems.
3:26
At the end of the day, all data on the
modern Internet is really just 0s and 1s.
3:30
The system that we are about to discuss,
right after this quick break,
3:35
are all built for specific formats
of these bits, these 1s and 0s.
3:39
Now keep in mind, there are thousands
of solutions in the world for
3:44
solving problem dealing with big data.
3:48
But before choosing, you need to know
what types of data you are dealing with
3:50
to help your choice of the right tool or
framework.
3:54
Let's dive into the major domains of
big data, starting with data storage.
3:57
You need to sign up for Treehouse in order to download course files.
Sign upYou need to sign up for Treehouse in order to set up Workspace
Sign up