Did you know that they are responsible for 35% of all the traffic on North American networks?
Content Delivery Network -- A content delivery network (CDN) is a system of distributed servers that deliver webpages and other web content (like video) to a user based on the geographic locations of the user, the origin of the webpage, and the actual content delivery server.
- Sandvine Global Internet Phenomena Report (showing Netflix’s traffic share)
- Netflix’s Keystone Data Pipeline
- Netflix Data Pipeline
- Open Source at Netflix (many of the custom tools Netflix has developed and open-sourced for solving Big Data problems at massive scale)
- Other companies using Netflix Open Source technologies
[MUSIC] 0:00 Now that we have a solid understanding of the general domains and 0:04 associated tools in the world of big data, 0:08 let's begin to look at a specific use case of these tools applied to real problems. 0:10 Let's take a look at Netflix, 0:15 one of the largest video content providers in the world. 0:17 Did you know that they are responsible for 0:20 35% of all the traffic on North American networks? 0:23 Netflix handles millions of concurrent users on a daily basis. 0:27 They must satisfy numerous business requirements and 0:31 service level agreement to keep their network latencies low, 0:34 their content recommendations accurate, and their services online. 0:38 All of their efforts need to scale to millions of users, so they invested in 0:42 many of the technologies that we've discussed throughout this course, and 0:46 have even developed many of their own in-house. 0:50 Most of the tools that they've created have been open sourced and 0:53 are in heavy use across other companies dealing with similar issues, 0:56 companies like Yelp and IBM, Squarespace, Yahoo, and many more. 1:00 So before we dive into how the issues were solved, 1:05 let's think a bit about what their issues are specifically. 1:08 The first issue I see is storage. 1:12 There are a ton of movies and shows on Netflix, right? 1:14 Well, they need to store massive amounts of video data in a way that can be 1:18 delivered quickly to a user in any part of the world where the company operates. 1:22 Now to address this problem, Netflix has had to invest in major content delivery 1:27 networks, or CDNs, database storage systems, caching systems, and 1:32 messaging pipelines to handle passing the content around their internal networks and 1:37 to their users. 1:41 Next, that sure is a lot of active users. 1:43 Processing all those incoming user interactions across many 1:47 different clients, from laptops to mobile phones, to Apple TVs, cable boxes, 1:50 video game systems? 1:54 They have to provide recommendations in real time for which content to watch next. 1:55 Other non-user based actions like monitoring and 2:01 logging need to be aggregated across all these clients as well. 2:04 Creating many streams of data that need to be processed in the back end, and 2:07 acted upon by other systems in order to keep the systems online and running well. 2:11 To address this problem, Netflix relies on a variety of stream processing services, 2:16 searching systems, and machine learning tools. 2:21 The next obvious problem is running the infrastructure needed to serve 2:24 all the traffic and users. 2:28 Netflix has to scale out thousands of machines across many geographic regions, 2:29 using Amazon Web Services, Google Cloud platform, and 2:34 self-hosted services to process, store, search, 2:37 and learn from the data produced by clients and backend systems. 2:40 To address this problem, 2:44 Netflix uses several of the infrastructure tools we have discussed. 2:45 Let's look more specifically into how they do that, right after this quick break. 2:49
You need to sign up for Treehouse in order to download course files.Sign up