Grouping Data With Partitions5:59 with Danielle Kiowski
Supercharge your window function with partitions. Partitions let you compare a row to a subset of results from a query so that you can compare one row to a grouping or category of other, related rows.
PARTITION BY — Introduces the partition clause, which divides the result set into groups. Remember that partitioning expressions must return one value per row of the result set.
SELECT PLAYER_NAME, PLAYER_RANKING, PLAYER_TEAM, COUNT(MATCH_ID) as MATCHES_PLAYED, SUM(CHECKMATE) as MATCHES_WON, SUM(CHECKMATE)/COUNT(MATCH_ID) as WIN_RATE, AVG(PLAYER_RANKING) OVER () as AVG_RANKING, PLAYER_RANKING - AVG(PLAYER_RANKING) OVER () as DIFF_FROM_AVERAGE, SUM(SUM(CHECKMATE)) OVER () as TOTAL_MATCHES, SUM(MAX(CHECKMATE)) OVER () as PLAYERS_WITH_WINS, MAX(SUM(CHECKMATE)) OVER () as HIGHEST_WINS, SUM(SUM(CHECKMATE)) OVER (PARTITION BY PLAYER_TEAM) / SUM(COUNT(MATCH_ID)) OVER (PARTITION BY PLAYER_TEAM) as TEAM_WIN_RATE, MIN(PLAYER_RANKING) OVER (PARTITION BY SUM(CHECKMATE)) as TIEBREAKER, MIN(PLAYER_RANKING) OVER (PARTITION BY SUM(CHECKMATE), PLAYER_TEAM) as TIEBREAKER_TEAM FROM treehouse.chess_data_matches GROUP BY PLAYER_NAME, PLAYER_RANKING,PLAYER_TEAM
Sometimes, it makes sense to view a row in the context of a subset of results from 0:00 the query instead of all of the results. 0:04 For instance, with our chess match dataset, 0:08 you might want to know how one player's performance compares to other players 0:10 on the same team rather than to the entire field. 0:14 A partition clause divides the result set into categories 0:17 according to conditions that you provide within the parenthesis in the over clause. 0:21 These groupings are like panes within the window of data that you are querying over. 0:25 Window functions on each row have access to the rows pane. 0:30 Let's use partitions to find out which team had the highest winning rate 0:34 in the tournament. 0:37 We already have a field for the players winning percentage, which is calculated 0:39 as the sum of checkmate divided by the count of match_id. 0:42 For the window function version of this field, we'll start with the numerator, 0:46 the number of wins. 0:50 We'll use sum of sum of checkmate, 0:51 just like we did to get the total number of matches in the tournament. 0:54 We'll add over to make it a window function. 0:58 Inside the parenthesis, will add partition by to start the partition clause. 1:02 Here, we'll put the conditions that the function should use to separate the window 1:08 into panes. 1:13 We want the numerator to be the total matches won by the player's team. 1:14 So we'll partition by player_team. 1:18 We'll do another window function for the denominator, using sum, 1:20 of count, of match_id. 1:24 Again, over partition by player_team. 1:28 We can order by this field to make it easier to see which teams win rate is 1:41 highest. 1:45 We do this by putting an order by clause in the overall query. 1:46 Instead of typing the entire window function again to tell the order by 1:51 clause which field to use for ordering, 1:55 we can use the number of the field in the overall query as a shortcut. 1:57 In our overall query, the window function we just created is the 12th field. 2:01 I'm also going add desc 2:05 to specify that it should order by that field in a descending fashion. 2:08 That way, the team with the highest win rate will be at the top and 2:12 it will be easier for us to see. 2:15 Let's run that. 2:16 Now each row shows the win rate of that player's team. 2:20 We can see that the blue team had the most successful tournament overall. 2:24 The partitioning conditions follow the same rules as the expressions 2:28 that we passed to the window function. 2:31 They must return one value per row of the result set. 2:33 Here, we can partition by player_team because it's in the grouping statement. 2:36 But we couldn't partition by checkmate, for example. 2:41 We could partition by sum of checkmate, so that all players with the same number of 2:43 wins would be grouped together, in one partition. 2:47 This would be helpful if we were trying to figure out a tiebreaker between players, 2:51 who all won the same number of matches. 2:54 For example, you could break the tie by 2:56 identifying the player that had the lowest ranking coming into the tournament, 2:59 because that player performed better relative to their skill level. 3:02 You could say they overcame the greatest odds and should win in the case of a tie. 3:05 Let's look at the minimum ranking for 3:09 players who won the same number of matches. 3:10 We'll do that by creating another window function, with min of player_ranking. 3:13 We can pass player_ranking directly to the min function because it's in the group 3:21 by clause. 3:25 We'll add over to make it a window function. 3:25 And in this case, in our partition clause, 3:29 we're going to partition by the sum of checkmate. 3:32 And that's going to group together all of the players that have the same number of 3:36 wins We'll order the results by the number of matches won so 3:40 that we have all the players who tied for first at the top. 3:47 To do that, we'll replace the 12 from last time by sum of checkmate. 3:51 We also have sum of checkmate in our original query. 3:57 So if we wanted, we could go up here and find that it's the 5th field and 4:00 use 5 instead. 4:04 If we look at players who won all of their matches, here it's the top four rows, 4:05 we find that Robert had the lowest ranking. 4:10 And if we look at the results of our window function, we see that for 4:14 all the players who won all of their matches, the tiebreaker number, 4:18 which is the minimum ranking for all those players, matches Robert's ranking of 2209. 4:21 If you want to partition by multiple items, separate them with commas. 4:27 Let's find the tiebreaker within each team. 4:30 We'll do that by duplicating the tiebreaker field and 4:33 calling it tiebreaker by team. 4:36 In the partition clause, we'll add a partition by player_team, 4:38 right after the partition by the number of wins. 4:42 Let's take another look at the players who won all of their matches. 4:48 Again, these top four rows. 4:52 Two of them, Emily and Kate, were both on the yellow team. 4:54 If we look at their rankings, 4:59 we can see that Emily went into the tournament with the lower player ranking. 5:00 And if we look over at the results of our window function, for the team tiebreaker. 5:04 If we look at the results for those two rows, for 5:09 the two players who won all of their matches on the yellow team. 5:12 The tiebreaker matches Emily's ranking. 5:14 Partitions are similar to the group by clause that we use with normal 5:17 aggregating functions. 5:20 When we group rows to use a normal function, 5:22 the rows that we're calculating over are all collapsed into one record. 5:24 That happens in the original report, 5:28 where all of the matches are collapsed by player. 5:30 Window functions, on the other hand, are different and more powerful. 5:32 They leave the underlying records from the query intact. 5:36 This means that we can use different partitions to compare different groups of 5:39 data in the same report. 5:42 In our example, we created window functions with three different partitions 5:43 and included them all in the same query. 5:47 Without window functions, 5:50 we need multiple queries, each with its own group by clause. 5:51 In other words, window functions let us write one report instead of three. 5:55
You need to sign up for Treehouse in order to download course files.Sign up