Heads up! To view this whole video, sign in with your Courses account or enroll in your free 7-day trial. Sign In Enroll
Preview
Start a free Courses trial
to watch this video
Did you know that they are responsible for 35% of all the traffic on North American networks?
Terms
Content Delivery Network -- A content delivery network (CDN) is a system of distributed servers that deliver webpages and other web content (like video) to a user based on the geographic locations of the user, the origin of the webpage, and the actual content delivery server.
Learn More
- Sandvine Global Internet Phenomena Report (showing Netflix’s traffic share)
- Netflix’s Keystone Data Pipeline
- Netflix Data Pipeline
- Open Source at Netflix (many of the custom tools Netflix has developed and open-sourced for solving Big Data problems at massive scale)
- Other companies using Netflix Open Source technologies
[MUSIC]
0:00
Now that we have a solid understanding
of the general domains and
0:04
associated tools in the world of big data,
0:08
let's begin to look at a specific use case
of these tools applied to real problems.
0:10
Let's take a look at Netflix,
0:15
one of the largest video
content providers in the world.
0:17
Did you know that they are responsible for
0:20
35% of all the traffic on
North American networks?
0:23
Netflix handles millions of
concurrent users on a daily basis.
0:27
They must satisfy numerous
business requirements and
0:31
service level agreement to keep
their network latencies low,
0:34
their content recommendations accurate,
and their services online.
0:38
All of their efforts need to scale to
millions of users, so they invested in
0:42
many of the technologies that we've
discussed throughout this course, and
0:46
have even developed many
of their own in-house.
0:50
Most of the tools that they've
created have been open sourced and
0:53
are in heavy use across other
companies dealing with similar issues,
0:56
companies like Yelp and IBM,
Squarespace, Yahoo, and many more.
1:00
So before we dive into how
the issues were solved,
1:05
let's think a bit about what
their issues are specifically.
1:08
The first issue I see is storage.
1:12
There are a ton of movies and
shows on Netflix, right?
1:14
Well, they need to store massive amounts
of video data in a way that can be
1:18
delivered quickly to a user in any part
of the world where the company operates.
1:22
Now to address this problem, Netflix has
had to invest in major content delivery
1:27
networks, or CDNs, database storage
systems, caching systems, and
1:32
messaging pipelines to handle passing the
content around their internal networks and
1:37
to their users.
1:41
Next, that sure is a lot of active users.
1:43
Processing all those incoming
user interactions across many
1:47
different clients, from laptops to mobile
phones, to Apple TVs, cable boxes,
1:50
video game systems?
1:54
They have to provide recommendations in
real time for which content to watch next.
1:55
Other non-user based
actions like monitoring and
2:01
logging need to be aggregated
across all these clients as well.
2:04
Creating many streams of data that need
to be processed in the back end, and
2:07
acted upon by other systems in order to
keep the systems online and running well.
2:11
To address this problem, Netflix relies on
a variety of stream processing services,
2:16
searching systems, and
machine learning tools.
2:21
The next obvious problem is running
the infrastructure needed to serve
2:24
all the traffic and users.
2:28
Netflix has to scale out thousands of
machines across many geographic regions,
2:29
using Amazon Web Services,
Google Cloud platform, and
2:34
self-hosted services to process,
store, search,
2:37
and learn from the data produced
by clients and backend systems.
2:40
To address this problem,
2:44
Netflix uses several of the infrastructure
tools we have discussed.
2:45
Let's look more specifically into how they
do that, right after this quick break.
2:49
You need to sign up for Treehouse in order to download course files.
Sign up