Databases Querying Relational Databases Set Operations Union Operations

Shan Liu
Shan Liu
3,612 Points

Duplicates?

So in the case of where there are duplicate values like with the BMW or the Honda because of their id numbers being different, how would we exclude them? Would it be with a DISTINCT/GROUP BY combo that allows you to filter out the duplicates? or is there a different keyword that is being used for this certain circumstance?

2 Answers

Steven Parker
Steven Parker
202,001 Points

Either DISTINCT or GROUP BY can be used to eliminate duplicates. Generally, you'd use GROUP BY if your output contained at least one aggregate value, and DISTINCT otherwise.

But for this exercise, combining with UNION works, as it implies "distinct", just remember that it applies to all displayed columns. To get only distinct makes, that means you would display the MakeName but not the MakeId. If the MakeId is displayed, then the rows are not duplicates.

Steven Parker
Steven Parker
202,001 Points

The catch in using GROUP BY is determining an aggregate method that makes sense for the data. Since the ID's come from separate sets, it doesn't really make sense to combine them. But you can still use GROUP BY to give you a single row for each make if you return the ID's in separate columns:

SELECT MAX(MakeId) AS "Domestic ID",
       MAX(FM) AS "Foreign ID",
       MakeName
FROM (SELECT MakeId, null AS FM, MakeName FROM Make
      UNION SELECT null, ForeignMakeId, MakeName FROM ForeignMake)
GROUP BY MakeName;
kevin hudson
PRO
kevin hudson
Pro Student 9,652 Points

@Steven Parker I thought the same thing but DISTINCT and GROUP BY does nothing. When joining we see Make and ForeignMake tables info where BMW is listed twice but has different ID's. I am assuming that we are stuck with that unless we add another WHERE condition:

SELECT  * FROM Make
WHERE MakeName < "D" AND MakeName != "BMW"

UNION 
SELECT  * FROM ForeignMake
WHERE MakeName < "D" AND MakeName != "BMW"

ORDER BY MakeName; 

output

Steven Parker
Steven Parker
202,001 Points

The problem there is that "BMW" has been eliminated completely, and coding it that way requires prior knowledge of the contents of the database.

See the comment I added to my answer for an example of using GROUP BY.