Heads up! To view this whole video, sign in with your Courses Plus account or enroll in your free 7-day trial. Sign In Enroll
Preview
Start a free Courses trial
to watch this video
We conclude this workshop on SQL Window Functions by introducing frame clauses. A frame clause lets you control the size of the window, or the number of rows over which the window function operates. This lets you analyze data to determine trends and rolling averages.
Definitions
- Frame clause — Narrows the frame to a subset of rows based on the order of the rows
- ROWS — Keyword that introduces a frame clause. Treats each row individually for purposes of inclusion in the frame.
- RANGE — Keyword that introduces a frame clause. Groups rows together according to the ordering clause for purposes of inclusion in the frame (i.e. “CURRENT ROW” includes all of the rows with the same ordering value as the row on which the result is returned).
Code Example
SELECT PLAYER_NAME,
PLAYER_RANKING,
PLAYER_TEAM,
COUNT(MATCH_ID) as MATCHES_PLAYED,
SUM(CHECKMATE) as MATCHES_WON,
SUM(CHECKMATE)/COUNT(MATCH_ID) as WIN_RATE,
AVG(PLAYER_RANKING) OVER () as AVG_RANKING,
PLAYER_RANKING - AVG(PLAYER_RANKING) OVER () as DIFF_FROM_AVERAGE,
SUM(SUM(CHECKMATE)) OVER () as TOTAL_MATCHES,
SUM(MAX(CHECKMATE)) OVER () as PLAYERS_WITH_WINS,
MAX(SUM(CHECKMATE)) OVER () as HIGHEST_WINS,
SUM(SUM(CHECKMATE)) OVER (PARTITION BY PLAYER_TEAM)
/ SUM(COUNT(MATCH_ID)) OVER (PARTITION BY PLAYER_TEAM) as TEAM_WIN_RATE,
MIN(PLAYER_RANKING) OVER (PARTITION BY SUM(CHECKMATE)) as TIEBREAKER,
MIN(PLAYER_RANKING) OVER (PARTITION BY SUM(CHECKMATE),
PLAYER_TEAM) as TIEBREAKER_TEAM,
RANK() OVER (PARTITION BY PLAYER_TEAM ORDER BY PLAYER_RANKING DESC) as RANKING,
MAX(PLAYER_RANKING) OVER (ORDER BY SUM(CHECKMATE)
ROWS BETWEEN UNBOUNDED PRECEDING and CURRENT ROW) as MAX_RANKING_ROWS,
MAX(PLAYER_RANKING) OVER (ORDER BY SUM(CHECKMATE)
ROWS BETWEEN UNBOUNDED PRECEDING and CURRENT ROW) as MAX_RANKING_RANGE,
SUM(SUM(CHECKMATE)) OVER (ORDER BY PLAYER_RANKING DESC RANGE UNBOUNDED PRECEDING)
/ SUM(SUM(CHECKMATE)) OVER () as PROP_MATCHES_WON_BY_EQ_OR_HIGHER,
SUM(SUM(CHECKMATE)) OVER (ORDER BY PLAYER_RANKING
RANGE BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
/ SUM(SUM(CHECKMATE)) OVER () as PROP_MATCHES_WON_BY_EQ_OR_HIGHER_2,
CAST(COUNT(PLAYER_NAME) OVER (ORDER BY PLAYER_RANKING DESC
RANGE UNBOUNDED PRECEDING) AS FLOAT)
/ COUNT(PLAYER_NAME) OVER ()::FLOAT as PROP_PLAYERS_EQ_OR_HIGHER
FROM treehouse.chess_data_matches
GROUP BY PLAYER_NAME, PLAYER_RANKING,PLAYER_TEAM
ORDER BY SUM(CHECKMATE)
We can enhance the power of
window functions further,
0:00
by specifying a frame clause.
0:03
The frame clause is appropriately named,
because it determines the size of
0:05
the window, or the number of rows over
which the window function operates.
0:08
So far ,we've been working
with the default frame
0:12
which encompasses the total result set.
0:15
Frame clauses allow us to narrow
our focus to a subset of rows.
0:18
For example, we could look from six
rows before up to the current row.
0:22
This is useful for
doing moving averages or running sum.
0:26
Say that you have a report of visitors
to a website broken down by day.
0:29
It's likely to have a lot of variation
depending on the day of the week.
0:34
Maybe people are more likely to
visit the site on weekdays, but
0:37
visits are lower on weekends.
0:40
With a lot of variation, it's hard to see
whether visits are trending up or down.
0:42
By adding in the average for the day,
plus the preceding six days,
0:46
you can get a rolling weekly average that
washes out day of the week differences.
0:50
Let's look at how to
create a frame clause,
0:54
it comes directly after
the order by clause.
0:56
Anytime that we have a frame clause,
there must be an ordering clause,
0:59
to specify which rows are before,
and which are after the current row.
1:02
The frame clause, tells the function
which row to start the window on, and
1:06
where to end it.
1:10
The syntax of the frame clause is between,
frames start, and frame end.
1:11
For the frame start and frame end,
there are three options.
1:17
We can specify the current row,
rows preceding and rows following.
1:21
The current row refers to the row of
the query on which the result will appear.
1:26
Rows preceding come before the current
row according to the ordering clause and
1:31
rows following come after.
1:35
If we specify unbounded preceding or
1:37
unbounded following, that means
the beginning or end of the window.
1:39
Sometimes the order clause will return
the same value for multiple rows.
1:43
For example,
if we ordered by the number of wins,
1:47
multiple players will
have the same number.
1:50
The keyword that we used to introduce
the frame clause either rows or
1:53
range determines what happens then.
1:56
Let's look at the chess tournament
report to see what the difference is.
1:59
Let's find out for each player
what the maximum ranking is for
2:03
players who won the same
number of matches or fewer.
2:06
We'll create a new field
using MAX( PLAYER_RANKING).
2:08
For the order by clause, we'll use
SUM(CHECKMATE) to get the number of wins.
2:17
Then we'll add the frame clause.
2:23
Let's see what happens when we introduce
the frame using the ROWS keyword.
2:25
We want the maximum to be calculated for
all of the rows before.
2:29
So we'll make the frame
start UNBOUNDED PRECEDING.
2:33
We don't want to include any of the rows
after, so we'll make the frame And
2:39
CURRENT ROW.
2:43
To compare this to what happens when we
introduce the frame clause using the range
2:51
keyword, let's duplicate
this entire function.
2:55
Then we'll replace the word ROWS,
with the word RANGE.
3:01
Let's order the entire query
by the SUM(CHECKMATE), so
3:13
that it's easier to see what
these functions are doing.
3:16
Let's look at the results of
our two window functions.
3:21
When we use rows,
3:24
sometimes the values change even when
the rows have the same order value.
3:25
This is because row looks at
the actual row number, but
3:29
range counts any row with the same
order value as part of the current row.
3:32
Range is the better choice, if you're
doing calculations that depend on an order
3:36
value that could be the same
across multiple rows.
3:40
In a regional sales report with one row
per region per day, you could use the SUM
3:43
window function partition by date to
get the total company sales to date.
3:47
Range would be the best choice in this
example because you want the total
3:51
to date to include all
the sales from the current date
3:54
not just from the regions that
are listed before the current row.
3:57
The rows keyword is sometimes the better
choice, because it allows you more control
4:00
over the number of rows
to include in the window.
4:04
Here, we've specified UNBOUNDED to
get to the beginning of the window.
4:07
If you use the ROWS keyword,
you can replace UNBOUNDED with an integer.
4:10
This is useful when you
want rolling totals or
4:14
averages over a certain number of rows.
4:16
For example, in a daily sales report
you can have a rolling seven days total
4:19
by using rows between six preceding and
current row.
4:24
If you leave out the frame clause
the default is range between
4:28
unbounded preceding and current row.
4:31
If there is also no ordering clause,
then every row is ordered equally so
4:34
the entire window is included.
4:38
There is also a simplified syntax that we
can use, if we want the frame to end at
4:40
the current row, in that case we can
leave out the between keyword, and
4:44
just use rows or
range followed by the frame start.
4:48
Now that we've seen how to create frame
clauses, we can answer our final question.
4:52
Do higher ranked players win more
often the lower ranked ones?
4:56
To do this, we'll find the percent of
all matches that are won by players with
5:00
a ranking equal to or
higher than the player in question.
5:04
Well compare that to the percentage
of total players that are of equal or
5:07
higher ranking.
5:10
First, let's find the numerator for
the win rate.
5:12
We know how to find the total
number of wins SUM(SUM(CHECKMATE)).
5:15
We're interested in how
all players compare, so
5:23
we'll leave out the partition clause.
5:25
We need an ordering clause to tell
the function which rows are higher and
5:28
which are lower.
5:31
So we'll use ORDER BY PLAYER_RANKING DESC.
5:33
Now, we'll add the frame clause.
5:40
There aren't any players with
the same ranking, but there could be.
5:42
It's good practice to use
RANGE in a case like this.
5:45
Since the order is descending,
5:48
all the higher rankings will be
in the fourth and current row.
5:49
So our frame start will
be UNBOUNDED PRECEDEDING.
5:52
We can use the simplified syntax, since we
want to end the frame on the current row.
5:55
Now we have the total matches won by
players with equal or higher ranking.
6:16
We'll divide by the total
number of matches.
6:20
We get that using SUM(SUM(CHECKMATE)) over
6:25
empty parenthesis for the entire window.
6:30
We'll compare this to the proportion
of players of equal or higher ranking.
6:39
We'll make a new field for that.
6:46
We'll use COUNT(PLAYER_NAME) to get
the total number of players, over, and
6:52
we're going to leave out the partitions
since we want the entire field.
6:58
But we're going to ORDER BY
the PLAYER_RANKING again.
7:05
Just like above, that will be descending
over a RANGE UNBOUNDED PRECEDING.
7:12
And we'll divide it by
the total number of players.
7:17
COUNT(PLAYER_NAME) OVER the entire window.
7:20
When we run the query we find
that it's all 0s and 1s.
7:46
That's because we're
dividing two integers.
7:49
In PostgreSQL, the precision of
the answer defaults to an integer also.
7:51
We want it to return
a number with decimals.
7:56
We'll need to change the type of the
numerator, denominator or both to do that.
7:58
Let's start with the numerator.
8:02
We can change it using the CAST function.
8:07
Which requires the expression
to be changed.
8:12
Which is the whole window function.
8:16
Then the word AS, and then the type.
8:20
We'll use FLOAT,
which will give us a number with decimals.
8:24
It's important to know to use the whole
window function including over and
8:28
the parenthesis containing the options
as the expression for a type conversion.
8:32
If you try to put the CAST just around
the first part, before the over keyword,
8:37
it won't work.
8:40
Now the proportion has decimals, so
8:45
it's showing us a lot more
information than just zeroes or ones.
8:47
We can also use a shorthand for
8:50
type conversion, which is a double
colon followed by the type.
8:51
Let's do that on the denominator It needs
to go at the end of the window function.
8:55
Similar to CAST,
9:03
it won't work if the double colon
notation is used before the OVER keyword.
9:04
Looking at the results of our window
function again, we can see that
9:13
the winning percentage is consistently
higher than the proportion of players, so
9:16
higher ranked players win more
frequently than lower ranked ones.
9:20
Now you have everything you need to
incorporate window functions into
9:24
your reports.
9:28
Feel free to experiment with the chess
dataset, or with your own data
9:29
to explore what window functions can
add to your analytical tool kit.
9:33
What other questions can you think of that
you can answer using window functions?
9:37
Try passing different expressions
in the different clauses and
9:41
see how it affects your result.
9:43
Above all,
have fun getting to know the data.
9:45
We'll see you next time.
9:48
You need to sign up for Treehouse in order to download course files.
Sign up