Welcome to the Treehouse Community

Want to collaborate on code errors? Have bugs you need feedback on? Looking for an extra set of eyes on your latest project? Get support with fellow developers, designers, and programmers of all backgrounds and skill levels here with the Treehouse Community! While you're at it, check out some resources Treehouse students have shared here.

Looking to learn something new?

Treehouse offers a seven day free trial for new students. Get access to thousands of hours of content and join thousands of Treehouse students and alumni in the community today.

Start your free trial

JavaScript

Kostas Koutoupis
Kostas Koutoupis
15,010 Points

Pass multiple objects to an array - Javascript

Code for the following question available at https://repl.it/@kkoutoup/ParliamentUKscraper

Question I'm trying to create a web scraper with node, request and cheerio. The scraper works in two phases (so far):

Phase One The first function scrapeCommitteesPage() visits a page, scrapes all links, adds them to an array, and passes the array to a second function called getUniqueIDs()

//dependencies
const request = require('request');
const cheerio = require('cheerio');

//uk parliament - select committees page
const committeesListUrl = 'https://bit.ly/2YXZBC7';

//get all links to committee pages
scrapeCommitteesPage=()=>{  
  request(committeesListUrl, (error, response, body)=>{
    if(!error && response.statusCode == 200){
      //pass response body to cheerio
      const $ = cheerio.load(body),
      //get all committee page links - ul.square-bullets-a-to-z li a
            committeeLinksArray = [],
            linkList = $('.square-bullets-a-to-z a'),
            parliamentUrl = 'https://www.parliament.uk';
      //push all links to array
      for(let i=0;i<linkList.length;i++){
        if(linkList[i]){
          committeeLinksArray.push(`https://www.parliament.uk${linkList[i].attribs.href}`);
        }
      }
    //pass array to uniqueIDs function
    getUniqueIDs(committeeLinksArray);  
    }else{
      console.log(error)
    }//end 1st request else
  })//end request
}

Phase Two The second function getUniqueIDs() loops through the array of links, passed as an argument from the first function, and saves information (name, id, url) into a committeeDetails object.

//get unique IDs
getUniqueIDs=(committeeLinksArray)=>{
  for(let i=0;i<committeeLinksArray.length;i++){
    if(committeeLinksArray[i]){
      request(committeeLinksArray[i], (error, response, body)=>{
        if(!error && response.statusCode == 200){
          //pass response body to cheerio
          const $ = cheerio.load(body),
                committeeDetails = [],
                //save committee name, id and url for rss feed
                committeeName = $("meta[property='og:title']").attr("content"),
                uniqueID = $("meta[name='search:cmsPageInstanceId']").attr("content"),
                committeeRSSUrl = `https://www.parliament.uk/g/rss/committee-feed/?pageInstanceId=${uniqueID}&type=Committee_Detail_Mixed`;
          //push details object to committeeDetails array      
          committeeDetails.push({
            'committee-name': committeeName,
            'committee-ID': uniqueID,
            'committee-RSS-URL': committeeRSSUrl
          })
          //pass details to visitCommitteePage function
          visitCommitteePage(committeeDetails)
        }else{
          console.log(error);
        }
      }) 
    }
  }//end for loop 
}

When I'm trying to console.log the object just to make sure all the information is there I get single arrays with the object properties but what i need is a single array of objects that will later be manipulated in a third function visitCommitteePage().

So the output I get looks like:

[ { 'committee-name': 'Defence Sub-Committee',
    'committee-ID': '105517',
    'committee-RSS-URL':
     'https://www.parliament.uk/g/rss/committee-feed/?pageInstanceId=105517&type=Committee_Detail_Mixed' } ]
[ { 'committee-name': 'Business, Energy and Industrial Strategy Committee',
    'committee-ID': '115803',
    'committee-RSS-URL':
     'https://www.parliament.uk/g/rss/committee-feed/?pageInstanceId=115803&type=Committee_Detail_Mixed' } ]

whereas what I want is an array of objects that would look like this:

[ { 'committee-name': 'Defence Sub-Committee',
    'committee-ID': '105517',
    'committee-RSS-URL':
     'https://www.parliament.uk/g/rss/committee-feed/?pageInstanceId=105517&type=Committee_Detail_Mixed' },
   { 'committee-name': 'Business, Energy and Industrial Strategy Committee',
    'committee-ID': '115803',
    'committee-RSS-URL':
     'https://www.parliament.uk/g/rss/committee-feed/?pageInstanceId=115803&type=Committee_Detail_Mixed' } 
]

How can I achieve that?

Apologies for the long question.

Code available at https://repl.it/@kkoutoup/ParliamentUKscraper

1 Answer

Steven Parker
Steven Parker
228,991 Points

In getUniqueIDs, the loop that goes through the committeeLinksArray clears the committeeDetails array, pushes an object into it, and then calls visitCommitteePagefor each item.

From your description, it sounds like you want to clear committeeDetails only once before the loop starts, and then call visitCommitteePage after the loop ends and all the objects have been added to the array. This would make it work more like what scrapeCommitteesPage does.

So with that in mind, a little re-arrangement of the statements should get it working as you intended.