Welcome to the Treehouse Community
Want to collaborate on code errors? Have bugs you need feedback on? Looking for an extra set of eyes on your latest project? Get support with fellow developers, designers, and programmers of all backgrounds and skill levels here with the Treehouse Community! While you're at it, check out some resources Treehouse students have shared here.
Looking to learn something new?
Treehouse offers a seven day free trial for new students. Get access to thousands of hours of content and join thousands of Treehouse students and alumni in the community today.
Start your free trialKostas Koutoupis
15,010 PointsPass multiple objects to an array - Javascript
Code for the following question available at https://repl.it/@kkoutoup/ParliamentUKscraper
Question I'm trying to create a web scraper with node, request and cheerio. The scraper works in two phases (so far):
Phase One
The first function scrapeCommitteesPage()
visits a page, scrapes all links, adds them to an array, and passes the array to a second function called getUniqueIDs()
//dependencies
const request = require('request');
const cheerio = require('cheerio');
//uk parliament - select committees page
const committeesListUrl = 'https://bit.ly/2YXZBC7';
//get all links to committee pages
scrapeCommitteesPage=()=>{
request(committeesListUrl, (error, response, body)=>{
if(!error && response.statusCode == 200){
//pass response body to cheerio
const $ = cheerio.load(body),
//get all committee page links - ul.square-bullets-a-to-z li a
committeeLinksArray = [],
linkList = $('.square-bullets-a-to-z a'),
parliamentUrl = 'https://www.parliament.uk';
//push all links to array
for(let i=0;i<linkList.length;i++){
if(linkList[i]){
committeeLinksArray.push(`https://www.parliament.uk${linkList[i].attribs.href}`);
}
}
//pass array to uniqueIDs function
getUniqueIDs(committeeLinksArray);
}else{
console.log(error)
}//end 1st request else
})//end request
}
Phase Two
The second function getUniqueIDs()
loops through the array of links, passed as an argument from the first function, and saves information (name, id, url) into a committeeDetails
object.
//get unique IDs
getUniqueIDs=(committeeLinksArray)=>{
for(let i=0;i<committeeLinksArray.length;i++){
if(committeeLinksArray[i]){
request(committeeLinksArray[i], (error, response, body)=>{
if(!error && response.statusCode == 200){
//pass response body to cheerio
const $ = cheerio.load(body),
committeeDetails = [],
//save committee name, id and url for rss feed
committeeName = $("meta[property='og:title']").attr("content"),
uniqueID = $("meta[name='search:cmsPageInstanceId']").attr("content"),
committeeRSSUrl = `https://www.parliament.uk/g/rss/committee-feed/?pageInstanceId=${uniqueID}&type=Committee_Detail_Mixed`;
//push details object to committeeDetails array
committeeDetails.push({
'committee-name': committeeName,
'committee-ID': uniqueID,
'committee-RSS-URL': committeeRSSUrl
})
//pass details to visitCommitteePage function
visitCommitteePage(committeeDetails)
}else{
console.log(error);
}
})
}
}//end for loop
}
When I'm trying to console.log
the object just to make sure all the information is there I get single arrays with the object properties but what i need is a single array of objects that will later be manipulated in a third function visitCommitteePage()
.
So the output I get looks like:
[ { 'committee-name': 'Defence Sub-Committee',
'committee-ID': '105517',
'committee-RSS-URL':
'https://www.parliament.uk/g/rss/committee-feed/?pageInstanceId=105517&type=Committee_Detail_Mixed' } ]
[ { 'committee-name': 'Business, Energy and Industrial Strategy Committee',
'committee-ID': '115803',
'committee-RSS-URL':
'https://www.parliament.uk/g/rss/committee-feed/?pageInstanceId=115803&type=Committee_Detail_Mixed' } ]
whereas what I want is an array of objects that would look like this:
[ { 'committee-name': 'Defence Sub-Committee',
'committee-ID': '105517',
'committee-RSS-URL':
'https://www.parliament.uk/g/rss/committee-feed/?pageInstanceId=105517&type=Committee_Detail_Mixed' },
{ 'committee-name': 'Business, Energy and Industrial Strategy Committee',
'committee-ID': '115803',
'committee-RSS-URL':
'https://www.parliament.uk/g/rss/committee-feed/?pageInstanceId=115803&type=Committee_Detail_Mixed' }
]
How can I achieve that?
Apologies for the long question.
Code available at https://repl.it/@kkoutoup/ParliamentUKscraper
1 Answer
Steven Parker
231,269 PointsIn getUniqueIDs
, the loop that goes through the committeeLinksArray
clears the committeeDetails
array, pushes an object into it, and then calls visitCommitteePage
— for each item.
From your description, it sounds like you want to clear committeeDetails
only once before the loop starts, and then call visitCommitteePage
after the loop ends and all the objects have been added to the array. This would make it work more like what scrapeCommitteesPage
does.
So with that in mind, a little re-arrangement of the statements should get it working as you intended.