Welcome to the Treehouse Community

Want to collaborate on code errors? Have bugs you need feedback on? Looking for an extra set of eyes on your latest project? Get support with fellow developers, designers, and programmers of all backgrounds and skill levels here with the Treehouse Community! While you're at it, check out some resources Treehouse students have shared here.

Looking to learn something new?

Treehouse offers a seven day free trial for new students. Get access to thousands of hours of content and join thousands of Treehouse students and alumni in the community today.

Start your free trial

JavaScript Node.js Basics (2014) Building a Command Line Application Perfecting: Getting Multiple Profiles

Building a scrapper using same principles as Node.js Basics

I'm trying to get to work a scrapper but I'm confused on how should I use a Loop to send different queries to the main url. For example the url is this http://website.com/categories/.json? and I need to scrape through http://website.com/categories/.json?query=<numbers>.

<numbers> is provided by each page in the JSON format as data.next attribute.

My question is: How can I use the data.next attribute inside a Loop to keep scrapping the website's pages? And what is the proper way to code this using the same principles as this course, like modules, forEach functions, etc?

1 Answer

I would put the scraping into a function that checks for the next property in data, and if it finds it, it calls itself again (recursive function), and if not, it just returns/ends.

You'll have to decide what to do with the data that is returned each time the function runs. Best case would be to return/yield a promise and do something with the result somewhere else (output to file, output to console, etc).

Thanks for the answer. I will store the data into MongoDB but I have another problem now. The code executes before the response comes from the HTTP request. Do you know how can I fix this?

Generally the HTTP request library you are using will allow you to pass a callback function as an argument to the request function. That callback will generally have a parameter for the response, which you use to get the data, once the request has completed.

Which library are you using? Node's default http or request?

I'm using HTTP. Request doesn't work on that page. I was using a do/while loop but it doesn't wait for each callback function. It just executes everything so I can only get the data from the first do{}block.