Moving Beyond the Row: The OVER Keyword4:36 with Danielle Kiowski
The OVER keyword lets you transform an aggregate function into a window function. This lets an aggregate function see, and access, all of the results of a query.
OVER — keyword used to create window functions. Follow with empty parentheses to access the entire result set.
SELECT PLAYER_NAME, PLAYER_RANKING, PLAYER_TEAM, COUNT(MATCH_ID) as MATCHES_PLAYED, SUM(CHECKMATE) as MATCHES_WON, SUM(CHECKMATE)/COUNT(MATCH_ID) as WIN_RATE, AVG(PLAYER_RANKING) OVER () as AVG_RANKING, PLAYER_RANKING - AVG(PLAYER_RANKING) OVER () as DIFF_FROM_AVERAGE, SUM(SUM(CHECKMATE)) OVER () as TOTAL_MATCHES, SUM(MAX(CHECKMATE)) OVER () as PLAYERS_WITH_WINS, MAX(SUM(CHECKMATE)) OVER () as HIGHEST_WINS FROM treehouse.chess_data_matches GROUP BY PLAYER_NAME, PLAYER_RANKING,PLAYER_TEAM
When we use aggregate functions like sum, count, or 0:00 average in a query, they calculate a result using 0:04 the records that are associated with their row by the GROUP BY statement. 0:07 They can't access the values of any records outside of their grouping. 0:11 So the values that they return are placed in the context of the overall results 0:15 with the over keyword we can open a Window so 0:19 that these aggregate functions can see and access all the results of the query. 0:22 This transforms a regular aggregate function into a Window function. 0:27 Using just the over keyword, 0:31 we can add information about the entire tournament to our existing report. 0:33 To answer our first question, 0:37 how does each player's ranking compare to the overall field. 0:39 We calculate the average ranking of all the players in the tournament. 0:42 To do this, we'll add a new field to the report. 0:45 Let's start out by adding the AVG(PLAYER_RANKING) AS AVG_RANKING. 0:49 This gives us a different number for 1:03 each player because the report is grouped by player. 1:05 The average function returns the player's own ranking on each row. 1:08 That's not what we want, we want the average across all the rows. 1:12 Let's have the over keyword to give the function access to all the rows 1:15 then we'll add empty parentheses. 1:20 The parentheses create a space. 1:22 To give the window function more instructions for 1:24 how to aggregate which we'll cover later. 1:26 For now, we'll leave those options blank. 1:29 So that we're querying over the whole result set. 1:31 Now, when we run the report, it's easy to see whether each player is above or 1:34 below the average ranking which is 1,738.7. 1:39 We can see this directly by subtracting that average from the player's ranking. 1:42 Let's make a new field to do that. 1:47 We'll start out with the player's own ranking. 1:51 Player ranking. 1:54 Then we'll subtract the window function that we just created. 1:57 Average of player ranking over the entire result set. 2:01 This tells us how far above or below average each player's ranking is. 2:14 It's important to know that unlike regular aggregating functions, window functions do 2:21 not have access to all of the records in the tables that you're selecting from. 2:25 Instead, they have access to the results of the grouping 2:29 that you've specified in the queries group by clause. 2:32 This means that if your query has a group by clause. 2:35 You can only use the grouping fields directly while any other fields 2:39 have to be aggregated inside the window function aggregation for example, for 2:42 our second question how many total matches were played in the tournament. 2:47 We want to add up the check me field over the entire result set 2:51 If we try to do that directly using some of Checkmate over empty parentheses, 2:54 it will not run because Checkmate is not a grouping field. 2:59 It's helpful to think of the window function as querying 3:03 over the results of the existing query. 3:06 In other words the window only looks on to the rows returned by the query 3:08 when the function looks at each row of the result set it has to get one value for 3:12 that row. 3:16 To use and its calculation. 3:17 This example doesn't work because the query results don't include 3:19 the check mate field. 3:22 Each row represents a collection of records with 3:23 multiple values for check mate. 3:26 If we want to add up all of those values, we have to first sum them up for 3:28 the row using sum of Checkmate. 3:32 And then we can pass that aggregation to the window function. 3:34 We can do this by adding a sum call 3:37 inside of the sum that belongs to the window function. 3:41 Overall, we get sum of sum of checkmate over empty parentheses. 3:44 By summing up the sums of all the players matches one, 3:49 we find that there were 90 total matches in the tournament. 3:52 You can make some match inner and outer aggregations. 3:55 For example If we pass (MAX(CHECKMATE) to the window function instead of sum, 3:59 we'll get the number PLAYERS_WITH_WINS 4:03 If we reverse that and take the MAX(SUM(CHECKMATE), 4:19 we'll get the highest number of wins by any player. 4:22 Combining different aggregations and window functions multiplies the power of 4:28 the original aggregation, so we can answer more complex questions about the dataset. 4:31
You need to sign up for Treehouse in order to download course files.Sign up