Welcome to the Treehouse Community

Want to collaborate on code errors? Have bugs you need feedback on? Looking for an extra set of eyes on your latest project? Get support with fellow developers, designers, and programmers of all backgrounds and skill levels here with the Treehouse Community! While you're at it, check out some resources Treehouse students have shared here.

Looking to learn something new?

Treehouse offers a seven day free trial for new students. Get access to thousands of hours of content and join thousands of Treehouse students and alumni in the community today.

Start your free trial

Python

T SL
PLUS
T SL
Courses Plus Student 378 Points

Best way to pull dataset from Python into noSQL database.

Hi,

I want to pull a dataset from an API every x minutes, into a MongoDB database. Then serve the data from flask to a Vue app (this I can do).

The data is in the format of

{ "id: 1234567, "placement": "abc1234", "ctr": "0.04", "cost": 5.8, "visits": 209, ..... }

This data doesn't have a time, so I would pull it and upon successful data pull store the timestamp along with it.

Later, let's say 5 minutes later, I'd pull it again (via a task manager). By this time the ctr might have changed, costs and visits have accumulated more. Every data pull is from start of day until current time.

So I'd have

00:00 -- 00:05 00:00 -- 00:11 00:10: -- 00:16

etc.

Sorry for the really bad explanation. But it's basically advertising data, and there's no way to set the time scope below "Today" in the API. I want to store the data and look for trends, and also be able to combine it with other data.

Is there a better way of doing it? Ho would you store this data?

T SL
T SL
Courses Plus Student 378 Points

To "normalize the data" with a start and end time, I'd subtract the record from the previous. In that way each chunk would get a start date. If that makes it easier.

Not sure what to search for to solve this problem.

2 Answers

If you're pulling data from the DB, might as well also write to it, with the timestamps in their own table and an auto-incrementing ID. Then just get the top/last row of that table to get the timestamp of the last pull.

T SL
PLUS
T SL
Courses Plus Student 378 Points

Hi, thanks for getting back to me. Sorry, I should have explained better.

The data is from an external API, and unfortunately it's already aggregated.

The data can either be "day until now" or "forever until now". If I pull them in snapshots every x minutes, I will have incremented cumulative data.

I.e. I can't pull data between t1 and t2 from the api -- there is no option to do so -- but I can pull data at every x minutes and "make my own" intervals. To get a snapshot between t1 and t2, I could subtract n-1 snapshot from n. How could I do this?

Let's say I pull every hour. The last hour will be a summary of all the data until now. The hour snapshot before will be a summary up until that hour etc.

Would it make sense to pull this data and store "as is" in Mongo, and then do the subtraction as I render it (in Python)? Each snapshot would then serve as a total, and if I want the data between interval n-5 (last 5 hours) I can do snapshot_n - snapshot_(n-5).

The end goal would be to have this advertising data, and be able to see trends. Later I could use it to act upon trends. For now though, I'd just want to gather it. I can find a lot of information on time series data storage in Mongo, but nothing that pertains to my or similar problems.