🌟 Dreaming of a bright future? 🎓 Ask about the Treehouse Scholarship program! 🚀

🤖 Level up your chatbot knowledge with our latest AI course.

Join our free community Discord server here!

Learn React with us!

Preview

Start a free Courses trial
to watch this video

Sign up for Treehouse

Grouping Data With Partitions

5:59 with Danielle Kiowski

Supercharge your window function with partitions. Partitions let you compare a row to a subset of results from a query so that you can compare one row to a grouping or category of other, related rows.

Teacher's Notes
Questions?
Video Transcript
Downloads
Workspaces

Definitions

PARTITION BY — Introduces the partition clause, which divides the result set into groups. Remember that partitioning expressions must return one value per row of the result set.

Code Example

SELECT PLAYER_NAME, 
    PLAYER_RANKING,
    PLAYER_TEAM,
    COUNT(MATCH_ID) as MATCHES_PLAYED,
    SUM(CHECKMATE) as MATCHES_WON,
    SUM(CHECKMATE)/COUNT(MATCH_ID) as WIN_RATE,
    AVG(PLAYER_RANKING) OVER () as AVG_RANKING,
    PLAYER_RANKING - AVG(PLAYER_RANKING) OVER () as DIFF_FROM_AVERAGE,
    SUM(SUM(CHECKMATE)) OVER () as TOTAL_MATCHES,
    SUM(MAX(CHECKMATE)) OVER () as PLAYERS_WITH_WINS,
    MAX(SUM(CHECKMATE)) OVER () as HIGHEST_WINS,
    SUM(SUM(CHECKMATE)) OVER (PARTITION BY PLAYER_TEAM)
        / SUM(COUNT(MATCH_ID)) OVER (PARTITION BY PLAYER_TEAM) as TEAM_WIN_RATE,
    MIN(PLAYER_RANKING) OVER (PARTITION BY SUM(CHECKMATE)) as TIEBREAKER,
    MIN(PLAYER_RANKING) OVER (PARTITION BY SUM(CHECKMATE), 
    PLAYER_TEAM) as TIEBREAKER_TEAM
FROM treehouse.chess_data_matches
GROUP BY PLAYER_NAME, PLAYER_RANKING,PLAYER_TEAM

Related Discussions

Have questions about this video? Start a discussion with the community and Treehouse staff.

Sign up

Related Discussions

Have questions about this video? Start a discussion with the community and Treehouse staff.

Sign up

Sometimes, it makes sense to view a row in the context of a subset of results from 0:00

the query instead of all of the results. 0:04

For instance, with our chess match dataset, 0:08

you might want to know how one player's performance compares to other players 0:10

on the same team rather than to the entire field. 0:14

A partition clause divides the result set into categories 0:17

according to conditions that you provide within the parenthesis in the over clause. 0:21

These groupings are like panes within the window of data that you are querying over. 0:25

Window functions on each row have access to the rows pane. 0:30

Let's use partitions to find out which team had the highest winning rate 0:34

in the tournament. 0:37

We already have a field for the players winning percentage, which is calculated 0:39

as the sum of checkmate divided by the count of match_id. 0:42

For the window function version of this field, we'll start with the numerator, 0:46

the number of wins. 0:50

We'll use sum of sum of checkmate, 0:51

just like we did to get the total number of matches in the tournament. 0:54

We'll add over to make it a window function. 0:58

Inside the parenthesis, will add partition by to start the partition clause. 1:02

Here, we'll put the conditions that the function should use to separate the window 1:08

into panes. 1:13

We want the numerator to be the total matches won by the player's team. 1:14

So we'll partition by player_team. 1:18

We'll do another window function for the denominator, using sum, 1:20

of count, of match_id. 1:24

Again, over partition by player_team. 1:28

We can order by this field to make it easier to see which teams win rate is 1:41

highest. 1:45

We do this by putting an order by clause in the overall query. 1:46

Instead of typing the entire window function again to tell the order by 1:51

clause which field to use for ordering, 1:55

we can use the number of the field in the overall query as a shortcut. 1:57

In our overall query, the window function we just created is the 12th field. 2:01

I'm also going add desc 2:05

to specify that it should order by that field in a descending fashion. 2:08

That way, the team with the highest win rate will be at the top and 2:12

it will be easier for us to see. 2:15

Let's run that. 2:16

Now each row shows the win rate of that player's team. 2:20

We can see that the blue team had the most successful tournament overall. 2:24

The partitioning conditions follow the same rules as the expressions 2:28

that we passed to the window function. 2:31

They must return one value per row of the result set. 2:33

Here, we can partition by player_team because it's in the grouping statement. 2:36

But we couldn't partition by checkmate, for example. 2:41

We could partition by sum of checkmate, so that all players with the same number of 2:43

wins would be grouped together, in one partition. 2:47

This would be helpful if we were trying to figure out a tiebreaker between players, 2:51

who all won the same number of matches. 2:54

For example, you could break the tie by 2:56

identifying the player that had the lowest ranking coming into the tournament, 2:59

because that player performed better relative to their skill level. 3:02

You could say they overcame the greatest odds and should win in the case of a tie. 3:05

Let's look at the minimum ranking for 3:09

players who won the same number of matches. 3:10

We'll do that by creating another window function, with min of player_ranking. 3:13

We can pass player_ranking directly to the min function because it's in the group 3:21

by clause. 3:25

We'll add over to make it a window function. 3:25

And in this case, in our partition clause, 3:29

we're going to partition by the sum of checkmate. 3:32

And that's going to group together all of the players that have the same number of 3:36

wins We'll order the results by the number of matches won so 3:40

that we have all the players who tied for first at the top. 3:47

To do that, we'll replace the 12 from last time by sum of checkmate. 3:51

We also have sum of checkmate in our original query. 3:57

So if we wanted, we could go up here and find that it's the 5th field and 4:00

use 5 instead. 4:04

If we look at players who won all of their matches, here it's the top four rows, 4:05

we find that Robert had the lowest ranking. 4:10

And if we look at the results of our window function, we see that for 4:14

all the players who won all of their matches, the tiebreaker number, 4:18

which is the minimum ranking for all those players, matches Robert's ranking of 2209. 4:21

If you want to partition by multiple items, separate them with commas. 4:27

Let's find the tiebreaker within each team. 4:30

We'll do that by duplicating the tiebreaker field and 4:33

calling it tiebreaker by team. 4:36

In the partition clause, we'll add a partition by player_team, 4:38

right after the partition by the number of wins. 4:42

Let's take another look at the players who won all of their matches. 4:48

Again, these top four rows. 4:52

Two of them, Emily and Kate, were both on the yellow team. 4:54

If we look at their rankings, 4:59

we can see that Emily went into the tournament with the lower player ranking. 5:00

And if we look over at the results of our window function, for the team tiebreaker. 5:04

If we look at the results for those two rows, for 5:09

the two players who won all of their matches on the yellow team. 5:12

The tiebreaker matches Emily's ranking. 5:14

Partitions are similar to the group by clause that we use with normal 5:17

aggregating functions. 5:20

When we group rows to use a normal function, 5:22

the rows that we're calculating over are all collapsed into one record. 5:24

That happens in the original report, 5:28

where all of the matches are collapsed by player. 5:30

Window functions, on the other hand, are different and more powerful. 5:32

They leave the underlying records from the query intact. 5:36

This means that we can use different partitions to compare different groups of 5:39

data in the same report. 5:42

In our example, we created window functions with three different partitions 5:43

and included them all in the same query. 5:47

Without window functions, 5:50

we need multiple queries, each with its own group by clause. 5:51

In other words, window functions let us write one report instead of three. 5:55

You need to sign up for Treehouse in order to download course files.

Sign up

You need to sign up for Treehouse in order to set up Workspace

Sign up