Bummer! This is just a preview. You need to be signed in with a Pro account to view the entire video.
Start a free Basic trial
to watch this video
Learn what SQL Window Functions are and why they're useful.
Follow along with this workshop:
You can download the database used in this workshop from the Downloads tab and load it into the database of your choice. Or, you can use Mode Analytics, an online database and SQL reporting service. You can create a free account and use our free data-source for this workshop:
- Create a free account on Mode: https://modeanalytics.com/signup
- Create a new Mode Report
- For the "data source" select the Mode Public Warehouse
- Use the
treehouse.chess_data_matches
table
You can then execute the SQL that we show you here in Mode.
Definitions:
Window function An SQL function that aggregates over the results of the query.
PostgreSQL Download
SQL from Video
SELECT PLAYER_NAME,
PLAYER_RANKING,
PLAYER_TEAM,
COUNT(MATCH_ID) AS MATCHES_PLAYED,
SUM(CHECKMATE) AS MATCHES_WON,
SUM(CHECKMATE)/COUNT(MATCH_ID) AS WIN_RATE
FROM treehouse.chess_data_matches
GROUP BY PLAYER_NAME,
PLAYER_RANKING,
PLAYER_TEAM
-
0:00
[MUSIC]
-
0:04
[SOUND] Hi I'm Danielle.
-
0:06
A data scientist here at Treehouse.
-
0:08
Window functions are a powerful feature available in some variations of SQL.
-
0:13
They let you analyze a row within the context of an entire result set.
-
0:17
In other words, you can compare one row to the other rows in a query.
-
0:21
For example, you can use them to calculate each row as a percent of the total or
-
0:26
to add a moving average to a report.
-
0:28
Say, you have a report listing your company's monthly sales by region,
-
0:31
one row of results per month per region.
-
0:36
If you want to track the performance of region over time,
-
0:38
looking at the sales number itself can be misleading.
-
0:41
The region sales may go up in one month, but
-
0:44
if the total sales went up more, that region could still be underperforming.
-
0:48
You can use window functions to calculate each region sales
-
0:51
as a percentage of total monthly sales, and
-
0:53
gauge performance, by tracking how the region's share of sales changes over time.
-
0:58
You can also use window functions to aggregate on multiple
-
1:01
levels within one query, just by using a function call.
-
1:05
If you want to see a sum for row as well as for a category,
-
1:08
that the row falls into, window functions can do that.
-
1:12
In the sales report example, you could compile your regional totals and
-
1:15
monthly totals in the same query.
-
1:17
This replaces the need for multiple reports, self joins, or
-
1:21
further aggregation outside of the original report.
-
1:24
So it streamlines queries and keeps everything in one place.
-
1:28
SQL Server, Oracle, PostgreSQL and similar database systems,
-
1:32
including Amazon Redshift and Snowflake, all support window functions.
-
1:38
If you're working with any of these variations of SQL,
-
1:40
window functions are a great way to take your SQL skills to the next level.
-
1:45
In this workshop, we'll be exploring a dataset of matches at a chess tournament.
-
1:49
Imagine that you're a reporter for a local paper, and
-
1:52
you're writing an article on the latest community chess tournament.
-
1:55
You want to understand more about how the players did, and you know
-
1:59
that with a little analysis, you'll find some interesting facts to report.
-
2:03
If you'd like to follow along,
-
2:04
you can download the data I'll be using from the downloads tab below this video.
-
2:08
Feel free to load it into your local database and follow along.
-
2:11
I'll be using PostgreSQL in these videos, but
-
2:14
any of the other databases systems listed above should work.
-
2:17
PostgreSQL is an open source database system, you can download it for free.
-
2:22
Check out the teacher's notes for more information.
-
2:25
Let's take a look at the data.
-
2:27
This is the dataset containing the results of the chess tournament.
-
2:30
It contains two records for each match, one line per player.
-
2:35
Each record has the player's ranking and team name.
-
2:38
It also has a checkmate column, which indicates if the player won that match.
-
2:43
It's 1 for a win and 0 for a loss.
-
2:46
We've put together a preliminary report using this dataset.
-
2:49
It shows each player's name, rank, team,
-
2:53
number of matches played, number of wins and win rate.
-
2:57
Here's the underlying SQL for this report.
-
3:00
These aggregates are obtained using regular functions.
-
3:03
So, they give us the count or sum within a report's grouping.
-
3:07
In this case, by the player's name, ranking and team.
-
3:13
The information that we have in the report is valuable but
-
3:15
it leaves a lot of questions unanswered.
-
3:18
It's missing context, which is key to making our article interesting.
-
3:22
We want to know how each player relates to the others in the tournament, so
-
3:26
that our article can provide a comprehensive view of what happened.
-
3:29
[SOUND] The questions that we'll explore are,
-
3:32
how does each player's ranking compare to the overall field?
-
3:35
How many total matches were played at the tournament?
-
3:38
Which team had the highest winning percentage?
-
3:41
Who is the top ranked player on each team?
-
3:44
And finally, do higher-ranked players win more often than lower-ranked ones?
-
3:49
Best of all, we won't need to create additional reports to find the answers.
-
3:53
Window functions will help us answer all of these questions
-
3:56
within our original report.
You need to sign up for Treehouse in order to download course files.
Sign up