1 00:00:00,263 --> 00:00:01,739 Welcome back. 2 00:00:01,739 --> 00:00:05,104 Now that we have seen a little bit about how Matplotlib works, 3 00:00:05,104 --> 00:00:07,897 let's talk briefly about different chart types. 4 00:00:07,897 --> 00:00:10,407 As we work through the rest of this course, 5 00:00:10,407 --> 00:00:13,836 you will get a better sense of how and when to use each type. 6 00:00:13,836 --> 00:00:16,546 Charts come in a wide range of visual styles and 7 00:00:16,546 --> 00:00:18,990 are used to represent different things. 8 00:00:18,990 --> 00:00:22,014 Matpotlib supports a lot of options. 9 00:00:22,014 --> 00:00:26,889 Line, bar, pie, scatter plot, histogram, box plot, 10 00:00:26,889 --> 00:00:32,676 heat maps and candlestick charts are some of the more common charts. 11 00:00:32,676 --> 00:00:36,548 In this course, we'll be showcasing scatter, histogram and 12 00:00:36,548 --> 00:00:38,308 box plots with Matplotlib. 13 00:00:38,308 --> 00:00:40,721 I'll touch on some of the others as well, and 14 00:00:40,721 --> 00:00:44,732 included links in the teacher's notes to other charts for further reading. 15 00:00:44,732 --> 00:00:49,111 Let's talk about the when's and why's of some of these chart types. 16 00:00:49,111 --> 00:00:53,014 A line chart is used to identify trends or patterns in data and 17 00:00:53,014 --> 00:00:56,084 commonly used for exploring trends over time. 18 00:00:56,084 --> 00:00:59,579 They can be used to compare multiple groups by using different lines. 19 00:00:59,579 --> 00:01:04,303 For example, the total sales of several products over a period of time. 20 00:01:04,303 --> 00:01:08,040 A bar chart is most effective when comparing categories of data. 21 00:01:08,040 --> 00:01:12,742 They can also be used, like a line chart, for tracking changes over time. 22 00:01:12,742 --> 00:01:17,542 When used in this fashion they, are best applied to large changes in the data. 23 00:01:17,542 --> 00:01:20,024 A bar chart will have two axes. 24 00:01:20,024 --> 00:01:23,974 One typically with numerical data and the other with a category. 25 00:01:23,974 --> 00:01:24,790 For example, 26 00:01:24,790 --> 00:01:29,215 the number of different types of apples sold each week at a farmers' market. 27 00:01:29,215 --> 00:01:34,932 Or for a time example, the total population of the world since 1000 BCE. 28 00:01:34,932 --> 00:01:39,904 A pie chart is best used when comparing parts of a whole at a snapshot in time, 29 00:01:39,904 --> 00:01:41,935 instead of a change over time. 30 00:01:41,935 --> 00:01:44,684 A pie chart would answer questions like, what is 31 00:01:44,684 --> 00:01:49,057 the percentage of each type of apples sold last week at our farmer's market. 32 00:01:49,057 --> 00:01:54,989 A pie chart will easily show that, for example, 23% were red delicious, 33 00:01:54,989 --> 00:01:59,655 18% were golden delicious and 15% were granny Smith. 34 00:01:59,655 --> 00:02:04,051 Scatter plots are similar to line charts in that they both have a horizontal and 35 00:02:04,051 --> 00:02:04,993 vertical axis. 36 00:02:04,993 --> 00:02:09,513 However, scatter plots are used to show how much one variable is 37 00:02:09,513 --> 00:02:12,456 impacting by another or its correlation. 38 00:02:12,456 --> 00:02:16,424 We use scatter plots to show relationships between values. 39 00:02:16,424 --> 00:02:21,137 The plotter points are markers, can be different sizes to showcase importance, 40 00:02:21,137 --> 00:02:24,154 and different color to show specific data buckets. 41 00:02:24,154 --> 00:02:28,261 It allows us to quickly visualize the distribution of the data and 42 00:02:28,261 --> 00:02:29,687 notice any outliers. 43 00:02:29,687 --> 00:02:32,786 We can see if there's a positive, negative, or 44 00:02:32,786 --> 00:02:37,715 non-existent correlation between data based on the scatter plot results. 45 00:02:37,715 --> 00:02:42,172 Histograms look like bar charts, however, looks can be deceiving as they 46 00:02:42,172 --> 00:02:45,936 are not the same, and are indeed used for different purposes. 47 00:02:45,936 --> 00:02:50,428 Histograms are used to show distributions of variables while bar charts 48 00:02:50,428 --> 00:02:52,646 are used to compare the variables. 49 00:02:52,646 --> 00:02:56,723 Unlike a bar chart a histogram won't have gaps between data. 50 00:02:56,723 --> 00:03:00,780 Empty values may be possible however if there are no data points for 51 00:03:00,780 --> 00:03:02,019 particular value. 52 00:03:02,019 --> 00:03:06,437 In a histogram chart, the data are split into different intervals or 53 00:03:06,437 --> 00:03:10,790 bins, to show the frequency of distribution of continuous data. 54 00:03:10,790 --> 00:03:15,007 This allows for the inspection of the data distribution, and 55 00:03:15,007 --> 00:03:17,450 will show outliers or skewed data. 56 00:03:17,450 --> 00:03:21,638 When using a histogram, choosing an appropriate number of bins and 57 00:03:21,638 --> 00:03:25,761 their width is important for meaningful and accurate reporting. 58 00:03:25,761 --> 00:03:29,466 Box plots, sometimes called box and whisker plots. 59 00:03:29,466 --> 00:03:33,435 Combine the functionality, the bar chart, with a histogram. 60 00:03:33,435 --> 00:03:38,314 They allow for the quick examination of and comparison between different sets of 61 00:03:38,314 --> 00:03:42,269 data while displaying statistical information about the data. 62 00:03:42,269 --> 00:03:45,762 It allows for visualizing the minimum first quartile, 63 00:03:45,762 --> 00:03:49,562 medium, third quartile, and maximum values of a data set. 64 00:03:49,562 --> 00:03:53,530 Wow, that's a lot like a high school statistics class. 65 00:03:53,530 --> 00:03:57,625 Put more simply it allows us to see the overall distribution, 66 00:03:57,625 --> 00:04:00,628 central value and variability of a data set. 67 00:04:00,628 --> 00:04:03,646 Much like a histogram choosing the number of bins and 68 00:04:03,646 --> 00:04:07,085 their width is an important consideration for reporting. 69 00:04:07,085 --> 00:04:12,278 Finally, I'd also like to briefly touch on heatmaps and candlestick charts. 70 00:04:12,278 --> 00:04:14,188 While we won't be using them in this course, 71 00:04:14,188 --> 00:04:16,028 you're likely to come across them. 72 00:04:16,028 --> 00:04:20,785 A heatmap is a chart in which the area inside recognized boundaries is shaded in 73 00:04:20,785 --> 00:04:23,462 proportion to the data being represented. 74 00:04:23,462 --> 00:04:27,218 For example you could have a heatmap representing population density. 75 00:04:27,218 --> 00:04:31,539 Countries with higher populations will be represented with different colors than 76 00:04:31,539 --> 00:04:33,458 countries with lower populations. 77 00:04:33,458 --> 00:04:37,015 Candlestick charts are heavily used in a financial sector. 78 00:04:37,015 --> 00:04:41,584 While they bear resemblance to box charts, that where their similarities end. 79 00:04:41,584 --> 00:04:46,121 Each candlestick will typically show one day of price movement of a stock, 80 00:04:46,121 --> 00:04:47,713 currency or derivative. 81 00:04:47,713 --> 00:04:51,225 It's like a combination of a line and bar chart showing 82 00:04:51,225 --> 00:04:55,972 an overtime trend while also showing the daily information for the data. 83 00:04:55,972 --> 00:05:00,475 Specifically they show the open, close, high, and low values for 84 00:05:00,475 --> 00:05:04,901 security and are a cornerstone of financial technical analysis. 85 00:05:04,901 --> 00:05:07,609 Wow, that's a lot of charting options, and 86 00:05:07,609 --> 00:05:11,825 it only scratches the surface of the charts available in Matplotlib. 87 00:05:11,825 --> 00:05:15,806 It's also just the beginning of the when and why to use each chart. 88 00:05:15,806 --> 00:05:19,823 Choosing the chart type is predominantly determined by the questions about your 89 00:05:19,823 --> 00:05:23,205 data and how to best represent your data for the intended audience. 90 00:05:23,205 --> 00:05:25,980 In addition to picking the proper chart type, 91 00:05:25,980 --> 00:05:30,688 there is one other aspect of reporting that is important to remember, scale. 92 00:05:30,688 --> 00:05:34,210 This is the value you mark on the axis to show the relationship between 93 00:05:34,210 --> 00:05:36,043 the units that are being measured. 94 00:05:36,043 --> 00:05:40,835 Often, we may want to utilize multiple chart types to showcase our data. 95 00:05:40,835 --> 00:05:44,698 When doing so, we need to keep scale in mind across our charts, so 96 00:05:44,698 --> 00:05:47,526 that our data viz efforts aren’t misleading. 97 00:05:47,526 --> 00:05:51,876 We’ll examine this a bit more as we look at more charting options with matplotlib. 98 00:05:51,876 --> 00:05:54,609 Now is a good time to take a short break. 99 00:05:54,609 --> 00:05:57,825 Get up and stretch a bit before we look at a real world dataset and 100 00:05:57,825 --> 00:06:00,510 see how to visualize it, using a variety of charts.