Best way to pull dataset from Python into noSQL database.

Question

Hi,

I want to pull a dataset from an API every x minutes, into a MongoDB database. Then serve the data from flask to a Vue app (this I can do).

The data is in the format of

{ "id: 1234567, "placement": "abc1234", "ctr": "0.04", "cost": 5.8, "visits": 209, ..... }

This data doesn't have a time, so I would pull it and upon successful data pull store the timestamp along with it.

Later, let's say 5 minutes later, I'd pull it again (via a task manager). By this time the ctr might have changed, costs and visits have accumulated more. Every data pull is from start of day until current time.

So I'd have

00:00 -- 00:05 00:00 -- 00:11 00:10: -- 00:16

etc.

Sorry for the really bad explanation. But it's basically advertising data, and there's no way to set the time scope below "Today" in the API. I want to store the data and look for trends, and also be able to combine it with other data.

Is there a better way of doing it? Ho would you store this data?

Answer 1 · 2018-03-27T22:09:00Z

March 27, 2018 10:09pm

If you're pulling data from the DB, might as well also write to it, with the timestamps in their own table and an auto-incrementing ID. Then just get the top/last row of that table to get the timestamp of the last pull.

Answer 2 · 2018-03-28T14:48:39Z

March 28, 2018 2:48pm

Hi, thanks for getting back to me. Sorry, I should have explained better.

The data is from an external API, and unfortunately it's already aggregated.

The data can either be "day until now" or "forever until now". If I pull them in snapshots every x minutes, I will have incremented cumulative data.

I.e. I can't pull data between t1 and t2 from the api -- there is no option to do so -- but I can pull data at every x minutes and "make my own" intervals. To get a snapshot between t1 and t2, I could subtract n-1 snapshot from n. How could I do this?

Let's say I pull every hour. The last hour will be a summary of all the data until now. The hour snapshot before will be a summary up until that hour etc.

Would it make sense to pull this data and store "as is" in Mongo, and then do the subtraction as I render it (in Python)? Each snapshot would then serve as a total, and if I want the data between interval n-5 (last 5 hours) I can do snapshot_n - snapshot_(n-5).

The end goal would be to have this advertising data, and be able to see trends. Later I could use it to act upon trends. For now though, I'd just want to gather it. I can find a lot of information on time series data storage in Mongo, but nothing that pertains to my or similar problems.

Welcome to the Treehouse Community

Looking to learn something new?

T SL

T SL

Best way to pull dataset from Python into noSQL database.

T SL

T SL

2 Answers

Iain Simmons

Iain Simmons

T SL

T SL