Welcome to the Treehouse Community
Want to collaborate on code errors? Have bugs you need feedback on? Looking for an extra set of eyes on your latest project? Get support with fellow developers, designers, and programmers of all backgrounds and skill levels here with the Treehouse Community! While you're at it, check out some resources Treehouse students have shared here.
Looking to learn something new?
Treehouse offers a seven day free trial for new students. Get access to thousands of hours of content and join thousands of Treehouse students and alumni in the community today.
Start your free trialRodrigo Muñoz
Courses Plus Student 20,171 PointsBuilding a scrapper using same principles as Node.js Basics
I'm trying to get to work a scrapper but I'm confused on how should I use a Loop to send different queries to the main url. For example the url is this http://website.com/categories/.json?
and I need to scrape through http://website.com/categories/.json?query=<numbers>
.
<numbers>
is provided by each page in the JSON format as data.next
attribute.
My question is: How can I use the data.next
attribute inside a Loop to keep scrapping the website's pages? And what is the proper way to code this using the same principles as this course, like modules, forEach functions, etc?
1 Answer
Iain Simmons
Treehouse Moderator 32,305 PointsI would put the scraping into a function that checks for the next
property in data
, and if it finds it, it calls itself again (recursive function), and if not, it just returns/ends.
You'll have to decide what to do with the data
that is returned each time the function runs. Best case would be to return/yield a promise and do something with the result somewhere else (output to file, output to console, etc).
Rodrigo Muñoz
Courses Plus Student 20,171 PointsRodrigo Muñoz
Courses Plus Student 20,171 PointsThanks for the answer. I will store the data into MongoDB but I have another problem now. The code executes before the response comes from the HTTP request. Do you know how can I fix this?
Iain Simmons
Treehouse Moderator 32,305 PointsIain Simmons
Treehouse Moderator 32,305 PointsGenerally the HTTP request library you are using will allow you to pass a callback function as an argument to the request function. That callback will generally have a parameter for the response, which you use to get the data, once the request has completed.
Which library are you using? Node's default http or request?
Rodrigo Muñoz
Courses Plus Student 20,171 PointsRodrigo Muñoz
Courses Plus Student 20,171 PointsI'm using HTTP. Request doesn't work on that page. I was using a do/while loop but it doesn't wait for each callback function. It just executes everything so I can only get the data from the first
do{}
block.