1 00:00:00,280 --> 00:00:04,000 We've seen how common table expressions help us write queries that are easier 2 00:00:04,000 --> 00:00:05,290 to read. 3 00:00:05,290 --> 00:00:10,350 CTEs also help organize our queries to match how we think about a data question. 4 00:00:10,350 --> 00:00:13,910 They're also really helpful when you want to analyze data from a single result set 5 00:00:13,910 --> 00:00:18,810 in different ways to create a new result set that summarizes aggregate data. 6 00:00:18,810 --> 00:00:20,130 Let's look at a simple example. 7 00:00:21,130 --> 00:00:24,530 Say I want to see a list of sales people, the total number of sales for 8 00:00:24,530 --> 00:00:28,290 each employee, the total revenue generated by each employee, 9 00:00:28,290 --> 00:00:31,780 as well as the average revenue generated per sale. 10 00:00:31,780 --> 00:00:35,440 Data like this is stored in a database that includes several tables. 11 00:00:35,440 --> 00:00:39,230 In this basic example, I only need to think about three tables. 12 00:00:39,230 --> 00:00:42,010 The employees table, with each employees ID and 13 00:00:42,010 --> 00:00:47,600 name, the Orders table that tracks each sale from an employee to a customer, 14 00:00:47,600 --> 00:00:51,640 and an order details table which includes line items in an invoice. 15 00:00:51,640 --> 00:00:55,700 In other words, one record for each product sold as part of an order. 16 00:00:55,700 --> 00:00:58,820 Obviously, there'll be more columns in each of these tables and 17 00:00:58,820 --> 00:01:00,650 more tables in this database. 18 00:01:00,650 --> 00:01:03,830 But this is all we need to think about for this query. 19 00:01:03,830 --> 00:01:07,300 The way I think about this problem breaks down into three sets. 20 00:01:07,300 --> 00:01:10,530 I can convert each of those into a common table expression. 21 00:01:10,530 --> 00:01:14,690 First, I want to generate a list of all sales by each employee. 22 00:01:14,690 --> 00:01:16,870 In other words, one line for each order. 23 00:01:16,870 --> 00:01:21,537 That includes the order ID, employer ID, and the amount invoiced for that order. 24 00:01:24,630 --> 00:01:29,431 I'll start by creating a common table expression named all_sales. 25 00:01:29,431 --> 00:01:32,300 And I'll build this up in several stages. 26 00:01:32,300 --> 00:01:35,282 First, I'll get a list of all orders and employee IDs. 27 00:01:40,719 --> 00:01:45,974 To see what data the CTE generates, I'll add a temporary query and run it. 28 00:01:48,433 --> 00:01:51,650 So this lists all sales for each employee. 29 00:01:51,650 --> 00:01:54,870 But I also want to get the revenue generated from the sales. 30 00:01:54,870 --> 00:01:56,880 That's the quantity and unit price and 31 00:01:56,880 --> 00:01:59,750 that's contained in the order details table. 32 00:01:59,750 --> 00:02:02,626 So we need to join the order details table and 33 00:02:02,626 --> 00:02:06,213 group all the line items from the same order together. 34 00:02:14,392 --> 00:02:18,275 Then, to calculate the sum, we'll take the unit price, 35 00:02:18,275 --> 00:02:22,330 multiply it by the quantity for each line item in each order. 36 00:02:23,540 --> 00:02:26,440 Because we've grouped the results by OrderId, 37 00:02:26,440 --> 00:02:29,190 we generated the total revenue for each invoice. 38 00:02:31,470 --> 00:02:35,140 So now we have a results set like a virtual table with all the data we need to 39 00:02:35,140 --> 00:02:36,260 answer our questions. 40 00:02:37,340 --> 00:02:41,590 The second CTE I'm after calculates the total revenue per per employee. 41 00:02:41,590 --> 00:02:44,990 One really useful feature of CTEs is that you can reference them 42 00:02:44,990 --> 00:02:47,330 later in other CTEs. 43 00:02:47,330 --> 00:02:51,696 In this case, I can use all sales like a table within another common table 44 00:02:51,696 --> 00:02:53,282 expression, like this. 45 00:02:58,557 --> 00:03:05,270 I'll call this revenue_by_employee and I'll select the employee ID. 46 00:03:05,270 --> 00:03:09,260 And I'll create a sum of the invoices, that gives me total revenue. 47 00:03:09,260 --> 00:03:12,649 And I'll pull it from all sales, our first common table expression. 48 00:03:14,871 --> 00:03:17,393 One really important thing to keep in mind, 49 00:03:17,393 --> 00:03:20,815 you can only reference a CTE created earlier in your query. 50 00:03:20,815 --> 00:03:25,665 In other words, the first CTE in your query can't reference the second CTE, 51 00:03:25,665 --> 00:03:27,470 third or fourth. 52 00:03:27,470 --> 00:03:32,750 Likewise, the second CTE can't reference a third CTE or the fourth one. 53 00:03:32,750 --> 00:03:34,380 You'll get a syntax error if you try. 54 00:03:35,800 --> 00:03:39,770 The last CTE will use the all sales common table expression 55 00:03:39,770 --> 00:03:42,680 to count the number of orders recorded by each employee. 56 00:03:44,870 --> 00:03:46,990 I'll call it sales by employee. 57 00:03:49,516 --> 00:03:53,082 And I'll select the employee ID and I'll simply do a count. 58 00:03:53,082 --> 00:03:55,937 And this gives me the total sales for that employee. 59 00:04:07,431 --> 00:04:09,606 Now let's see what kind of data this gives us. 60 00:04:20,686 --> 00:04:25,680 We can see that employee number 1 has had 123 sales. 61 00:04:25,680 --> 00:04:27,900 Employee 2 has had 96. 62 00:04:27,900 --> 00:04:29,940 Now, to get our final result, 63 00:04:29,940 --> 00:04:33,350 it's just a matter of joining the revenue by employee and 64 00:04:33,350 --> 00:04:39,410 sales by employee common table expressions to generate one record per employee. 65 00:04:39,410 --> 00:04:43,180 I can join the two CTEs by the employee ID number. 66 00:04:48,478 --> 00:04:53,308 You'll notice that the first common table expression that I created isn't even used 67 00:04:53,308 --> 00:04:54,525 in this final query. 68 00:05:02,206 --> 00:05:05,137 The first CTE is simply a result set that I used to create 69 00:05:05,137 --> 00:05:06,613 the other two result sets. 70 00:05:08,427 --> 00:05:10,170 Let's see the results. 71 00:05:10,170 --> 00:05:13,440 Not much to see here, just a list of all employees with sales. 72 00:05:13,440 --> 00:05:17,249 Let's calculate sales figures based on information in the two CTEs. 73 00:05:21,292 --> 00:05:25,010 Total revenue comes from the second common table expression. 74 00:05:25,010 --> 00:05:27,970 A total count of the number of orders is from the third. 75 00:05:27,970 --> 00:05:29,799 To calculate the average per order, 76 00:05:29,799 --> 00:05:32,277 I'll just divide the revenue by the sales count. 77 00:05:39,588 --> 00:05:43,148 Now, the last step is to grab the employee's last name, 78 00:05:43,148 --> 00:05:47,833 which we can do simply by joining on the employees table by the EmployeeId. 79 00:05:54,617 --> 00:05:58,137 And to determine which sales person generated the most revenue, 80 00:05:58,137 --> 00:06:01,020 I'll order this by the total_revenue descending. 81 00:06:04,490 --> 00:06:07,708 And then I just need to add the employees name. 82 00:06:13,148 --> 00:06:16,190 Now, there are many ways we could've achieved the same results. 83 00:06:16,190 --> 00:06:18,750 I didn't have to use common table expressions. 84 00:06:18,750 --> 00:06:21,810 The same results could be achieved by joining all of these tables in 85 00:06:21,810 --> 00:06:23,230 a single query. 86 00:06:23,230 --> 00:06:26,950 However, the resulting query will be much more complicated looking. 87 00:06:26,950 --> 00:06:30,250 It also would have forced me to work through all the logic to weave the data 88 00:06:30,250 --> 00:06:32,460 together in one long query. 89 00:06:32,460 --> 00:06:36,460 Using CTE, I was able to breakdown my data analysis problem 90 00:06:36,460 --> 00:06:39,650 into three different queries, then combine those queries 91 00:06:39,650 --> 00:06:43,380 using some very simple joints to get the report I was after. 92 00:06:43,380 --> 00:06:45,470 Even better, this query is very readable. 93 00:06:45,470 --> 00:06:49,130 Anyone looking at it can easily see what information is being queried 94 00:06:49,130 --> 00:06:51,590 at each step and how the queries come together. 95 00:06:52,780 --> 00:06:57,690 I hope you can see how CTE's can simplify your SQL by breaking down queries 96 00:06:57,690 --> 00:07:02,460 into easily created and easily understood modules that can be combined in 97 00:07:02,460 --> 00:07:05,780 interesting ways to solve data analysis problems. 98 00:07:05,780 --> 00:07:06,390 Have fun.