Welcome to the Treehouse Community

Want to collaborate on code errors? Have bugs you need feedback on? Looking for an extra set of eyes on your latest project? Get support with fellow developers, designers, and programmers of all backgrounds and skill levels here with the Treehouse Community! While you're at it, check out some resources Treehouse students have shared here.

Looking to learn something new?

Treehouse offers a seven day free trial for new students. Get access to thousands of hours of content and join thousands of Treehouse students and alumni in the community today.

Start your free trial

Python

Travis Bailey
Travis Bailey
13,675 Points

Database for storage of social media data for analysis. How do I organize this?

I'm trying to automate some basic reporting around social media content performance. This one Python script I'm working on scans either a YouTube channel or YouTube playlist, and returns the channel's name, subscriber count, video titles, and video views.

While I can easily get it to print out the data I need to store it so I can later generate the report, and also perform analysis. I need to know what the subscriber and view counts were at specific dates, so I can't just store those numbers in one column and update them every time the script is run.

Currently I have it setup as below:

Table 1: videoId videoTitle

Table 2: channelID channelTitle

Table 3: weekId week

Table 4: sCheckId weekId(FK Table 3) channelId(FK Table 2) subscriberNumber

Table 5: vCheckID weekID(FK Table 3) channelID(FK Table2) videoId(FK Table1) videoViews

So initially the script runs and checks to see if the channel or videos already exist in table 2 and table 1 respectively. Table 3 generates a new row that contains the week ID which serves as the date reference so for example weekId 1501 is Year 15 week 1. The script pulls the channel's subscriber count and adds it to table 4. Table 4 generates an sCheckId(just a primary key that auto increments) and saves the subscriber count referencing the weekID from table 3, and the channelId from table 2. The script then checks the view counts of all the videos in a similar fashion by generating a vCheckID(another primary key that auto increments), saves the videoViews number, references the weekId for a date, the channelID so we know what YouTube channel the video lives on, and finally references the videoID so we know what video the view count is for.

I'm still trying to get the above to work, but I can't help but feel this is needlessly complicated. Anyone else have suggestion on how to store the data?

1 Answer

Why not combine tables 3, 4 and 5 into one and name it something a bit more relevant, such as 'reports'. You don't need a table to just hold a list of weeks. I think you'd be much better of storing an actual date in one column of a reports table.

Then, you can store rows for each 'report', which would have a date, a channel ID, and then either a subscriber number or a video ID and number of views.

Or if you really want you can use separate tables for 'channel reports' and 'video reports' or something similar.

You might want to decide whether you have a one channel to many videos relationship in your database as well (rather than always having multiple foreign key columns in your report tables).

Then use joins when you need them (you don't really need a channel ID to be stored in a report about a specific video, but if the relationship is there then you can join the videos table to get the corresponding channel ID).

If you do a search for 'SQL Scheme Generator' or similar, you can find some useful tools for planning this kind of thing, including this one: WWW SQL Designer.