Heads up! To view this whole video, sign in with your Courses account or enroll in your free 7-day trial. Sign In Enroll
Well done!
      You have completed Combining Data for Analysis!
      
    
You have completed Combining Data for Analysis!
Preview
    
      
  Welcome back! In this video, I will provide an overview of the data we will use in the workshop, and review the concept of joining data.
Types of Joins
- The Inner Join returns only matching records found in both tables.
 - The Outer Join returns all records from both tables (aka Full Outer Join).
 - The Left Join returns all records from the left table and the matching records from the right table (aka Left Outer Join).
 - The Right Join returns all records from the right table and the matching records from the left table (aka Right Outer Join).
 
Related Discussions
Have questions about this video? Start a discussion with the community and Treehouse staff.
Sign upRelated Discussions
Have questions about this video? Start a discussion with the community and Treehouse staff.
Sign up
                      Welcome back.
                      0:00
                    
                    
                      In this video, I will provide an overview
of the data we will use in the workshop,
                      0:02
                    
                    
                      and review the concept of joining data.
                      0:06
                    
                    
                      If you're familiar with SQL or
relational algebra,
                      0:09
                    
                    
                      you're probably familiar with the term "join".
                      0:12
                    
                    
                      You may have seen the Venn diagram
with the two interlocking shapes, and
                      0:15
                    
                    
                      we'll revisit that idea.
                      0:19
                    
                    
                      There are four types of joins.
                      0:21
                    
                    
                      So, let's look at how each join
type yields a different result.
                      0:23
                    
                    
                      For this example, I have two tables.
                      0:27
                    
                    
                      The first table is the Top 10 songs
from the Billboard 100 Chart,
                      0:30
                    
                    
                      released on July 28, 2018.
                      0:35
                    
                    
                      It contains the Billboard rank
along with the song information.
                      0:37
                    
                    
                      The second table contains the 12 songs
with the most streams on the Spotify music
                      0:42
                    
                    
                      platform for the same week.
                      0:46
                    
                    
                      This table contains the song info and
the number of streams for that week.
                      0:49
                    
                    
                      Comparing the name column in both tables,
there are six songs present in both,
                      0:54
                    
                    
                      In My Feelings by Drake,
I Like It by Cardi B,
                      1:02
                    
                    
                      Girls Like You by Maroon 5,
Nice for What, also by Drake,
                      1:06
                    
                    
                      Lucid Dreams by Juice WRLD,
Better Now by Post Malone.
                      1:11
                    
                    
                      There are additional songs in each table
that are not present in the other table.
                      1:16
                    
                    
                      Each join option handles those
additional songs differently.
                      1:21
                    
                    
                      The inner join returns only matching
records found in both tables.
                      1:26
                    
                    
                      The matching records will display as
a combination from the two tables as
                      1:31
                    
                    
                      a single record.
                      1:35
                    
                    
                      For rows for which there is no
match between the two tables,
                      1:37
                    
                    
                      the records are discarded.
                      1:40
                    
                    
                      The inner join results in
the smallest combined dataset.
                      1:43
                    
                    
                      For this example, the combined table will
contain the six records found in both
                      1:47
                    
                    
                      tables, listing the Billboard rank,
song info, and number of Spotify streams.
                      1:52
                    
                    
                      The other songs from both
tables are discarded.
                      1:58
                    
                    
                      The outer join, or full outer join,
returns all records from both tables.
                      2:02
                    
                    
                      The matching records will display as
a combination from the two tables,
                      2:08
                    
                    
                      as a single record.
                      2:12
                    
                    
                      The number of rows in the resulting table,
will be the sum of the rows from the two
                      2:14
                    
                    
                      tables minus the number
of matching records.
                      2:18
                    
                    
                      In this case, 10 rows from the left table,
plus 12 rows from the right table,
                      2:22
                    
                    
                      minus 6 matching records,
equals 16 rows in the combined table.
                      2:27
                    
                    
                      The combined table will include
the Billboard rank, song info, and
                      2:33
                    
                    
                      number of Spotify streams in each row.
                      2:37
                    
                    
                      We have the Billboard rank and
Spotify streams for six songs.
                      2:41
                    
                    
                      However, we don't have the number of
Spotify streams for four songs for
                      2:46
                    
                    
                      the Billboard table, and
                      2:50
                    
                    
                      we don't have the Billboard rank of
six songs from the Spotify table.
                      2:51
                    
                    
                      For the missing numbers,
we show them as null or NAN, Not a Number.
                      2:56
                    
                    
                      In an outer join,
we preserve all data from both tables.
                      3:02
                    
                    
                      We don't lose any information
with a full outer join.
                      3:06
                    
                    
                      This results in the largest
combined dataset.
                      3:10
                    
                    
                      The left join, or left outer join, returns
all records from the left table and
                      3:13
                    
                    
                      the matching records from the right
table if any matching records exist.
                      3:18
                    
                    
                      The resulting table will contain the same
number of rows as the left table, and
                      3:24
                    
                    
                      contain the same columns.
                      3:28
                    
                    
                      Additional columns found in the right
table will be added, as well.
                      3:30
                    
                    
                      For this example, the combined
table will contain ten records,
                      3:34
                    
                    
                      listing the Billboard rank song info,
and number of Spotify streams.
                      3:38
                    
                    
                      We have both Billboard rank and
Spotify streams for six songs.
                      3:44
                    
                    
                      However, we don't have the number
of Spotify streams for
                      3:48
                    
                    
                      four songs from the Billboard table.
                      3:51
                    
                    
                      For those values, we usually show these
missing values as null or Not a Number.
                      3:54
                    
                    
                      In the left join, we preserve all
data from the Billboard table and
                      3:59
                    
                    
                      add relevant data from the Spotify table.
                      4:03
                    
                    
                      The other songs from the Spotify
table are discarded.
                      4:06
                    
                    
                      The right join for right outer join,
returns all records from the right table
                      4:10
                    
                    
                      and the matching records from the left
table, if any matching records exists.
                      4:14
                    
                    
                      The resulting table will contain the same
number of rows as the right table and
                      4:20
                    
                    
                      contain the same columns.
                      4:23
                    
                    
                      Additional columns found in the left
table will be added, as well.
                      4:26
                    
                    
                      For rows for
which there is no match on the right side,
                      4:30
                    
                    
                      the resulting table will contain
a null value or Not a Number.
                      4:33
                    
                    
                      For this example, that combined
table will contain 12 records,
                      4:37
                    
                    
                      listing the Billboard rank, song info, and
number of Spotify streams.
                      4:41
                    
                    
                      We have both Billboard rank and
Spotify streams for six songs.
                      4:46
                    
                    
                      However, we don't have the Billboard rank
for six songs from the Spotify table.
                      4:50
                    
                    
                      For those values, we usually show these
missing values as null or Not a Number.
                      4:56
                    
                    
                      In the right join, we preserve all
data from the Spotify table, and
                      5:01
                    
                    
                      add relevant data from
the Billboard table.
                      5:05
                    
                    
                      The other songs from the Billboard
table are discarded.
                      5:08
                    
                    
                      We have much more data from Billboard and
Spotify for our analysis.
                      5:12
                    
                    
                      In fact, our data sets include
song information on the weekly
                      5:16
                    
                    
                      Billboard 100 and
Spotify 200 charts for TWO WHOLE YEARS.
                      5:20
                    
                    
                      That's tens of thousands of records for
us to analyze.
                      5:25
                    
                    
                      Here's our mission.
                      5:29
                    
                    
                      For every song on the Billboard Chart,
                      5:30
                    
                    
                      we wanna know how many Spotify
streams are received for that week.
                      5:32
                    
                    
                      Billboard will be our left dataset,
and Spotify will be our right dataset.
                      5:36
                    
                    
                      Can you guess which type
of join we want to perform?
                      5:40
                    
                    
                      If you said left join, or
left outer join, you would be correct.
                      5:44
                    
                    
                      For our purposes, if a song was not on the
Billboard Chart that week, we don't need
                      5:49
                    
                    
                      to know how many times it was streamed,
so that data will be discarded.
                      5:54
                    
                    
                      Here's another scenario.
                      5:59
                    
                    
                      What if we have another table that
contains the next five songs on
                      6:01
                    
                    
                      the Billboard Chart for our chosen week?
                      6:04
                    
                    
                      We could append these new
songs to the original list.
                      6:07
                    
                    
                      And now we have a new table with the top
15 songs on the Billboard Chart.
                      6:10
                    
                    
                      We will learn how to combine
data in this fashion, as well.
                      6:16
                    
                    
                      Like I said,
Pandas is a very powerful tool.
                      6:19
                    
                    
                      I can't wait to get started.
                      6:23
                    
                    
                      In the next video,
we will create our datasets and
                      6:25
                    
                    
                      learn how to merge them
within the Pandas tool.
                      6:27
                    
                    
                      See you there.
                      6:30
                    
              
        You need to sign up for Treehouse in order to download course files.
Sign upYou need to sign up for Treehouse in order to set up Workspace
Sign up