Heads up! To view this whole video, sign in with your Courses Plus account or enroll in your free 7-day trial. Sign In Enroll
Preview
Start a free Courses trial
to watch this video
Supercharge your window function with partitions. Partitions let you compare a row to a subset of results from a query so that you can compare one row to a grouping or category of other, related rows.
Definitions
PARTITION BY — Introduces the partition clause, which divides the result set into groups. Remember that partitioning expressions must return one value per row of the result set.
Code Example
SELECT PLAYER_NAME,
PLAYER_RANKING,
PLAYER_TEAM,
COUNT(MATCH_ID) as MATCHES_PLAYED,
SUM(CHECKMATE) as MATCHES_WON,
SUM(CHECKMATE)/COUNT(MATCH_ID) as WIN_RATE,
AVG(PLAYER_RANKING) OVER () as AVG_RANKING,
PLAYER_RANKING - AVG(PLAYER_RANKING) OVER () as DIFF_FROM_AVERAGE,
SUM(SUM(CHECKMATE)) OVER () as TOTAL_MATCHES,
SUM(MAX(CHECKMATE)) OVER () as PLAYERS_WITH_WINS,
MAX(SUM(CHECKMATE)) OVER () as HIGHEST_WINS,
SUM(SUM(CHECKMATE)) OVER (PARTITION BY PLAYER_TEAM)
/ SUM(COUNT(MATCH_ID)) OVER (PARTITION BY PLAYER_TEAM) as TEAM_WIN_RATE,
MIN(PLAYER_RANKING) OVER (PARTITION BY SUM(CHECKMATE)) as TIEBREAKER,
MIN(PLAYER_RANKING) OVER (PARTITION BY SUM(CHECKMATE),
PLAYER_TEAM) as TIEBREAKER_TEAM
FROM treehouse.chess_data_matches
GROUP BY PLAYER_NAME, PLAYER_RANKING,PLAYER_TEAM
Sometimes, it makes sense to view a row in
the context of a subset of results from
0:00
the query instead of all of the results.
0:04
For instance,
with our chess match dataset,
0:08
you might want to know how one player's
performance compares to other players
0:10
on the same team rather
than to the entire field.
0:14
A partition clause divides
the result set into categories
0:17
according to conditions that you provide
within the parenthesis in the over clause.
0:21
These groupings are like panes within the
window of data that you are querying over.
0:25
Window functions on each row
have access to the rows pane.
0:30
Let's use partitions to find out which
team had the highest winning rate
0:34
in the tournament.
0:37
We already have a field for the players
winning percentage, which is calculated
0:39
as the sum of checkmate divided
by the count of match_id.
0:42
For the window function version of this
field, we'll start with the numerator,
0:46
the number of wins.
0:50
We'll use sum of sum of checkmate,
0:51
just like we did to get the total
number of matches in the tournament.
0:54
We'll add over to make
it a window function.
0:58
Inside the parenthesis, will add partition
by to start the partition clause.
1:02
Here, we'll put the conditions that the
function should use to separate the window
1:08
into panes.
1:13
We want the numerator to be the total
matches won by the player's team.
1:14
So we'll partition by player_team.
1:18
We'll do another window function for
the denominator, using sum,
1:20
of count, of match_id.
1:24
Again, over partition by player_team.
1:28
We can order by this field to make it
easier to see which teams win rate is
1:41
highest.
1:45
We do this by putting an order
by clause in the overall query.
1:46
Instead of typing the entire window
function again to tell the order by
1:51
clause which field to use for ordering,
1:55
we can use the number of the field
in the overall query as a shortcut.
1:57
In our overall query, the window function
we just created is the 12th field.
2:01
I'm also going add desc
2:05
to specify that it should order by
that field in a descending fashion.
2:08
That way, the team with the highest
win rate will be at the top and
2:12
it will be easier for us to see.
2:15
Let's run that.
2:16
Now each row shows the win
rate of that player's team.
2:20
We can see that the blue team had
the most successful tournament overall.
2:24
The partitioning conditions follow
the same rules as the expressions
2:28
that we passed to the window function.
2:31
They must return one value
per row of the result set.
2:33
Here, we can partition by player_team
because it's in the grouping statement.
2:36
But we couldn't partition by checkmate,
for example.
2:41
We could partition by sum of checkmate, so
that all players with the same number of
2:43
wins would be grouped together,
in one partition.
2:47
This would be helpful if we were trying to
figure out a tiebreaker between players,
2:51
who all won the same number of matches.
2:54
For example, you could break the tie by
2:56
identifying the player that had the lowest
ranking coming into the tournament,
2:59
because that player performed better
relative to their skill level.
3:02
You could say they overcame the greatest
odds and should win in the case of a tie.
3:05
Let's look at the minimum ranking for
3:09
players who won the same
number of matches.
3:10
We'll do that by creating another window
function, with min of player_ranking.
3:13
We can pass player_ranking directly to
the min function because it's in the group
3:21
by clause.
3:25
We'll add over to make
it a window function.
3:25
And in this case, in our partition clause,
3:29
we're going to partition
by the sum of checkmate.
3:32
And that's going to group together all of
the players that have the same number of
3:36
wins We'll order the results by
the number of matches won so
3:40
that we have all the players who tied for
first at the top.
3:47
To do that, we'll replace the 12
from last time by sum of checkmate.
3:51
We also have sum of checkmate
in our original query.
3:57
So if we wanted, we could go up here and
find that it's the 5th field and
4:00
use 5 instead.
4:04
If we look at players who won all of their
matches, here it's the top four rows,
4:05
we find that Robert had
the lowest ranking.
4:10
And if we look at the results of our
window function, we see that for
4:14
all the players who won all of their
matches, the tiebreaker number,
4:18
which is the minimum ranking for all those
players, matches Robert's ranking of 2209.
4:21
If you want to partition by multiple
items, separate them with commas.
4:27
Let's find the tiebreaker
within each team.
4:30
We'll do that by duplicating
the tiebreaker field and
4:33
calling it tiebreaker by team.
4:36
In the partition clause,
we'll add a partition by player_team,
4:38
right after the partition
by the number of wins.
4:42
Let's take another look at the players
who won all of their matches.
4:48
Again, these top four rows.
4:52
Two of them, Emily and Kate,
were both on the yellow team.
4:54
If we look at their rankings,
4:59
we can see that Emily went into the
tournament with the lower player ranking.
5:00
And if we look over at the results of our
window function, for the team tiebreaker.
5:04
If we look at the results for
those two rows, for
5:09
the two players who won all of
their matches on the yellow team.
5:12
The tiebreaker matches Emily's ranking.
5:14
Partitions are similar to the group
by clause that we use with normal
5:17
aggregating functions.
5:20
When we group rows to
use a normal function,
5:22
the rows that we're calculating over
are all collapsed into one record.
5:24
That happens in the original report,
5:28
where all of the matches
are collapsed by player.
5:30
Window functions, on the other hand,
are different and more powerful.
5:32
They leave the underlying
records from the query intact.
5:36
This means that we can use different
partitions to compare different groups of
5:39
data in the same report.
5:42
In our example, we created window
functions with three different partitions
5:43
and included them all in the same query.
5:47
Without window functions,
5:50
we need multiple queries,
each with its own group by clause.
5:51
In other words, window functions let
us write one report instead of three.
5:55
You need to sign up for Treehouse in order to download course files.
Sign up