1 00:00:00,000 --> 00:00:04,350 [MUSIC] 2 00:00:04,350 --> 00:00:06,478 [SOUND] Hi I'm Danielle. 3 00:00:06,478 --> 00:00:08,610 A data scientist here at Treehouse. 4 00:00:08,610 --> 00:00:13,180 Window functions are a powerful feature available in some variations of SQL. 5 00:00:13,180 --> 00:00:17,320 They let you analyze a row within the context of an entire result set. 6 00:00:17,320 --> 00:00:21,640 In other words, you can compare one row to the other rows in a query. 7 00:00:21,640 --> 00:00:26,280 For example, you can use them to calculate each row as a percent of the total or 8 00:00:26,280 --> 00:00:27,850 to add a moving average to a report. 9 00:00:28,920 --> 00:00:31,988 Say, you have a report listing your company's monthly sales by region, 10 00:00:31,988 --> 00:00:36,050 one row of results per month per region. 11 00:00:36,050 --> 00:00:38,900 If you want to track the performance of region over time, 12 00:00:38,900 --> 00:00:41,820 looking at the sales number itself can be misleading. 13 00:00:41,820 --> 00:00:44,010 The region sales may go up in one month, but 14 00:00:44,010 --> 00:00:48,280 if the total sales went up more, that region could still be underperforming. 15 00:00:48,280 --> 00:00:51,550 You can use window functions to calculate each region sales 16 00:00:51,550 --> 00:00:53,810 as a percentage of total monthly sales, and 17 00:00:53,810 --> 00:00:57,640 gauge performance, by tracking how the region's share of sales changes over time. 18 00:00:58,900 --> 00:01:01,620 You can also use window functions to aggregate on multiple 19 00:01:01,620 --> 00:01:05,340 levels within one query, just by using a function call. 20 00:01:05,340 --> 00:01:08,430 If you want to see a sum for row as well as for a category, 21 00:01:08,430 --> 00:01:11,060 that the row falls into, window functions can do that. 22 00:01:12,080 --> 00:01:15,520 In the sales report example, you could compile your regional totals and 23 00:01:15,520 --> 00:01:17,890 monthly totals in the same query. 24 00:01:17,890 --> 00:01:21,630 This replaces the need for multiple reports, self joins, or 25 00:01:21,630 --> 00:01:24,420 further aggregation outside of the original report. 26 00:01:24,420 --> 00:01:28,051 So it streamlines queries and keeps everything in one place. 27 00:01:28,051 --> 00:01:32,337 SQL Server, Oracle, PostgreSQL and similar database systems, 28 00:01:32,337 --> 00:01:38,060 including Amazon Redshift and Snowflake, all support window functions. 29 00:01:38,060 --> 00:01:40,590 If you're working with any of these variations of SQL, 30 00:01:40,590 --> 00:01:44,220 window functions are a great way to take your SQL skills to the next level. 31 00:01:45,280 --> 00:01:49,760 In this workshop, we'll be exploring a dataset of matches at a chess tournament. 32 00:01:49,760 --> 00:01:52,350 Imagine that you're a reporter for a local paper, and 33 00:01:52,350 --> 00:01:55,680 you're writing an article on the latest community chess tournament. 34 00:01:55,680 --> 00:01:59,010 You want to understand more about how the players did, and you know 35 00:01:59,010 --> 00:02:02,210 that with a little analysis, you'll find some interesting facts to report. 36 00:02:03,250 --> 00:02:04,770 If you'd like to follow along, 37 00:02:04,770 --> 00:02:08,710 you can download the data I'll be using from the downloads tab below this video. 38 00:02:08,710 --> 00:02:11,910 Feel free to load it into your local database and follow along. 39 00:02:11,910 --> 00:02:14,600 I'll be using PostgreSQL in these videos, but 40 00:02:14,600 --> 00:02:17,735 any of the other databases systems listed above should work. 41 00:02:17,735 --> 00:02:22,410 PostgreSQL is an open source database system, you can download it for free. 42 00:02:22,410 --> 00:02:24,170 Check out the teacher's notes for more information. 43 00:02:25,300 --> 00:02:26,220 Let's take a look at the data. 44 00:02:27,380 --> 00:02:30,700 This is the dataset containing the results of the chess tournament. 45 00:02:30,700 --> 00:02:34,260 It contains two records for each match, one line per player. 46 00:02:35,310 --> 00:02:38,950 Each record has the player's ranking and team name. 47 00:02:38,950 --> 00:02:43,220 It also has a checkmate column, which indicates if the player won that match. 48 00:02:43,220 --> 00:02:46,050 It's 1 for a win and 0 for a loss. 49 00:02:46,050 --> 00:02:49,320 We've put together a preliminary report using this dataset. 50 00:02:49,320 --> 00:02:53,320 It shows each player's name, rank, team, 51 00:02:53,320 --> 00:02:57,590 number of matches played, number of wins and win rate. 52 00:02:57,590 --> 00:02:59,420 Here's the underlying SQL for this report. 53 00:03:00,780 --> 00:03:03,810 These aggregates are obtained using regular functions. 54 00:03:03,810 --> 00:03:07,870 So, they give us the count or sum within a report's grouping. 55 00:03:07,870 --> 00:03:11,410 In this case, by the player's name, ranking and team. 56 00:03:13,480 --> 00:03:15,940 The information that we have in the report is valuable but 57 00:03:15,940 --> 00:03:18,300 it leaves a lot of questions unanswered. 58 00:03:18,300 --> 00:03:22,670 It's missing context, which is key to making our article interesting. 59 00:03:22,670 --> 00:03:26,390 We want to know how each player relates to the others in the tournament, so 60 00:03:26,390 --> 00:03:29,878 that our article can provide a comprehensive view of what happened. 61 00:03:29,878 --> 00:03:32,391 [SOUND] The questions that we'll explore are, 62 00:03:32,391 --> 00:03:35,816 how does each player's ranking compare to the overall field? 63 00:03:35,816 --> 00:03:38,850 How many total matches were played at the tournament? 64 00:03:38,850 --> 00:03:41,690 Which team had the highest winning percentage? 65 00:03:41,690 --> 00:03:44,520 Who is the top ranked player on each team? 66 00:03:44,520 --> 00:03:49,369 And finally, do higher-ranked players win more often than lower-ranked ones? 67 00:03:49,369 --> 00:03:53,780 Best of all, we won't need to create additional reports to find the answers. 68 00:03:53,780 --> 00:03:56,890 Window functions will help us answer all of these questions 69 00:03:56,890 --> 00:03:58,030 within our original report.