How Does Netflix Apply Big Data Tools to Solve these Problems?3:59 with Craig Dennis and Jared Smith
Let's take a look at how Netflix solves their issues.
- Docker -- Docker is an open-source project that automates the deployment of applications inside software containers, and runs on top of Linux, Windows, Mac, and other operating systems.
- Anomaly Detection -- Anomaly detection is the process of looking for abnormalities in data to discover potentially interesting insights, ranging from security incidents to service failures.
Netflix applies many of the tools and 0:00 frameworks to the domains of big data that we have discussed, 0:02 to solve the problems that they have at operating on such a massive scale. 0:05 From utilizing processing frameworks, to storage systems, to infrastructure tools, 0:09 Netflix exemplifies many of the actual use cases of big data tools. 0:14 So let's dive into three particular examples of how Netflix applies tools that 0:18 we've discussed, to solve their problems. 0:22 So let's take a look at the Data Pipeline. 0:25 Netflix ingest data using Apache Kafka. 0:28 It has two sets of Kafka clusters. 0:31 The first cluster is at the front end, and 0:34 gets its messages from clients that produce messages. 0:36 Which is basically every instance of the Netflix application. 0:39 It sends those messages to the back end services, using the second Kafka cluster. 0:43 Which consumes messages, and sends them to services, like Apache, 0:48 Spark, and Hadoop, for processing and other Netflix custom application. 0:51 The front end Kafka instance also passes data to storage layers, 0:56 such as Amazon S3, and to the search tool, Elasticsearch. 1:01 This is used for both storing data in short and long term storage on Amazon web 1:05 services, as well as for searching across all events using Elasticsearch. 1:10 To ensure resiliency of the Kafka clusters, 1:15 there is a replication factor of 2 for all the messages in the system. 1:18 Which means that they will always have a backup message for 1:22 every message in the system. 1:25 Messages are retained in the Kafka system for 1:27 up to 24 hours, even after being sent downstream or to storage layers. 1:29 In total, Netflix operates 36 Kafka clusters, which consume an average 1:34 of 700 billion messages per day, and up to 8 million events per second. 1:40 Netflix uses machine learning to produce recommendations for 1:46 each of their users based on what videos they have watched and liked. 1:49 So let's look at the problem of serving those recommendations. 1:53 Each of the Netflix clients produce messages, 1:57 which are adjusted by a Kafka cluster, and then sent to the back end for 2:00 processing offline with Apache Hadoop, and Amazon web services. 2:04 Netflix also uses Apache Spark, with the Spark's streaming extension, 2:09 as well as their own version of Apache Storm, 2:13 to produce new recommendations with sub minute latency. 2:16 When their systems need to retrieve prior results from the machine learning 2:20 algorithms, the back end fetches the results from a combination of MySQL, 2:24 Cassandra, and Netflix's own caching engine, which they have also open sourced, 2:28 EVCache, which stands for Ephemeral Volatile Cache. 2:34 In order to run these services at such a massive scale, 2:39 Netflix utilizes various other infrastructure tools and frameworks. 2:42 A major open source infrastructure tool in heavy use as Netflix is Apache Mesos. 2:46 Mesos is a resource management service that can be used to 2:51 run clusters of machines, schedule jobs and services, and much more. 2:54 It runs thousands of containers and 2:59 services at Netflix that help to keep the back end and user facing services online. 3:01 Now, specifically, it runs a mix of Batch, Stream Processing, and 3:06 Service Style Workloads. 3:11 Their use cases for MESOS include Real Time Anomaly Detection, 3:12 Training and Model Building Batch Jobs, 3:17 Machine Learning Orchestration, and Node.js Based Microservices. 3:19 Netflix has built several internal big data tools for resource management. 3:24 Some examples are Mantis, which is a reactive stream processing engine, 3:29 which processes up to 8 million events per second. 3:33 Titus is a docker container job management and execution platform. 3:36 Meson is a general purpose workflow orchestration & scheduling framework 3:40 that powers hundreds of concurrent Machine Learning pipelines. 3:45 Netflix uses a wide range of big data tools across their entire stack, and, 3:49 therefore, are such a great example of how far an organization can 3:53 take the things that we've learned about in this course. 3:56
You need to sign up for Treehouse in order to download course files.Sign up