When you start dealing with large datasets, the first question you should ask is “What kind of data are we dealing with?”
- GIS -- Geographic Information Systems
- Relational Database -- A database that stores data according to a predefined schema.
- SQL -- Structured Query Language
- NoSQL -- An overloaded term for non-relational databases
When you start dealing with large datasets, 0:00 the first question you should ask is, what kind of data are we dealing with? 0:02 Is it structured text data? 0:06 Or is unstructured? 0:08 Is data missing? 0:09 Is it not text? 0:11 Is it video? 0:11 Or is it audio? 0:12 Does it have a location, or geo-spacial information tied to it? 0:13 Are there a lot of default values in the data? 0:17 The list goes on and on. 0:19 To really make sense out of the data that you have on hand, and to begin to solve 0:21 the business problems with it, the first step is to recognize the type of data 0:25 that you have and to put it into the appropriate systems for storing that data. 0:30 For instance, if your data is something like a group of customers and 0:35 their buying habits, it will probably fit best in a relational SQL database or 0:38 a document-based NoSQL database. 0:42 If your data is a social network in structure, it's dealing with how 0:45 things are interconnected, you probably want to use a graph database. 0:48 if you have thousands of videos to process, 0:53 you need to know that before you try to store it in systems not equipped for 0:55 the high levels of band width needed to transfer that data in and out. 0:59 Let's explore the major types of data that you may encounter. 1:04 Structured data is data which is formatted in a specific structure. 1:07 This means we can often separate the data into fields that we can access. 1:13 Structured data makes the most sense for 1:18 a relational database where the structure won't change very often. 1:20 Some common examples of structured data are application logs, 1:23 customer information, and financial data. 1:27 Now conversely to our structured data is, 1:30 this is data which cannot easily fit into one or more defined labels. 1:34 For instance, when Twitter analyzes tweets for malicious content, 1:38 they can never be certain of exactly what the data represents in the tweets. 1:41 Words and content can mean very different things, and because there is so 1:45 much data to analyze, the unstructured nature makes processing it even harder. 1:49 Common examples of unstructured data include social media posts, books, and 1:54 healthcare data. 1:58 One of the most common unstructured forms of data actually deserves its own type. 2:00 This kind of data requires high amounts of bandwidth to process and store. 2:05 Compression is almost always necessary. 2:09 You can think of examples from Netflix to social media posts. 2:12 Video conferencing apps also produce and consume this type of data. 2:15 A common request is to store additional location data or 2:19 metadata alongside other information. 2:22 When this happens, a world of possibilities opens up. 2:25 Now it's possible to analyze everything from simple location tracking 2:29 to how some object interacts with the product in different places overtime. 2:33 A common example of this is any app that tracks your location, 2:38 any sort of mapping or direction app like Waze or Google Maps. 2:42 It could also be used for analyzing fleets of trucks for a company, or 2:46 even for military. 2:50 Think drones of tracking soldiers on the ground. 2:51 The Internet of Things, or IoT for short, has created connected devices and 2:54 sensors all over the world. 2:58 These range from weather sensors for 3:00 alerting approaching tornadoes, all the way to fitness apps. 3:01 The accelerometer of every person with a modern smartphone can now record activity 3:05 information and send it back to a plethora of apps for optimizing your workouts. 3:10 Sensors can transmit all kinds of types of data, 3:15 from structured texts to video and audio. 3:18 In addition to those examples, some other use cases are vehicle 3:21 communication systems, smart home devices and traffic monitoring systems. 3:26 At the end of the day, all data on the modern Internet is really just 0s and 1s. 3:30 The system that we are about to discuss, right after this quick break, 3:35 are all built for specific formats of these bits, these 1s and 0s. 3:39 Now keep in mind, there are thousands of solutions in the world for 3:44 solving problem dealing with big data. 3:48 But before choosing, you need to know what types of data you are dealing with 3:50 to help your choice of the right tool or framework. 3:54 Let's dive into the major domains of big data, starting with data storage. 3:57
You need to sign up for Treehouse in order to download course files.Sign up