Welcome to the Treehouse Community

Want to collaborate on code errors? Have bugs you need feedback on? Looking for an extra set of eyes on your latest project? Get support with fellow developers, designers, and programmers of all backgrounds and skill levels here with the Treehouse Community! While you're at it, check out some resources Treehouse students have shared here.

Looking to learn something new?

Treehouse offers a seven day free trial for new students. Get access to thousands of hours of content and join thousands of Treehouse students and alumni in the community today.

Start your free trial

Databases

Jan Lundeen
Jan Lundeen
5,882 Points

Real life examples - when to use a subquery with IN vs. a subquery with derived tables(Querying Relational Databases)

Hi,

In the section on subqueries in the Querying Relational Databases course, there are sections on subqueries using IN and subqueries using derived tables. From my notes, I know that we create derived tables in a query because the data we need isn't stored in the exact format or structure that we want. This happens a lot when doing a lot of aggregation or reporting queries. However, in a real-life situation, I'm not sure I'd know "off the bat" which one would be best to use. Can you help me out with this? Real life examples would really help.

Thanks,

Jan

1 Answer

Rebekah Smith
Rebekah Smith
7,263 Points

Hi Jan,

tl;dr Use a regular old JOIN when possible. Use a derived table otherwise.

In almost every case, a WHERE...IN(subquery) or a derived table should be rewritten to use a JOIN instead. (Other subqueries like WHERE...NOT EXISTS(subquery) are really useful but that's off-topic.) When a JOIN is insufficient, the derived table approach is almost always better than WHERE...IN(subquery) because it's more powerful/flexible (you can only return 1 column from WHERE...IN(subquery)) and I believe may perform better with large result set.

Using WHERE...IN(subquery) Example: Our client sells products. They want productorder info for their biggest sales The productorder table includes product id, quantity purchased, what the customer was charged, etc. The sale table has info on sales like date, totalrevenue, etc.

SELECT *
FROM productorder
WHERE date IN (SELECT date FROM sale ORDER BY totalrevenue DESC LIMIT 3);

After racking my brain, the above query is the only example I could come up with where I would use WHERE...IN(subquery) because it's just a little cleaner. Even so, it could also be written as a derived table, with the same performance and results:

SELECT po.* 
FROM productorder po
INNER JOIN (SELECT date FROM sale ORDER BY totalrevenue DESC LIMIT 3) AS s ON s.date = po.date;

Using derived table Example: Our client is considering selling some products in bulk so they need info on how their customers purchase large quantities. For each product, they need the product's id and the total amount customers have spent on large orders.

SELECT productid, SUM(charged) AS 'Total spent on large orders'
FROM
(SELECT po.productid, po.charged FROM productorder AS po WHERE po.quantity > 20) AS largeorder
GROUP BY productid

There's just no way to write the above query using a JOIN or WHERE...IN because of the filtering and aggregation, and the multiple columns returned from the derived table.

Jan Lundeen
Jan Lundeen
5,882 Points

Thanks Rebekah! That helps.

Jan