Welcome to the Treehouse Community

The Treehouse Community is a meeting place for developers, designers, and programmers of all backgrounds and skill levels to get support. Collaborate here on code errors or bugs that you need feedback on, or asking for an extra set of eyes on your latest project. Join thousands of Treehouse students and alumni in the community today. (Note: Only Treehouse students can comment or ask questions, but non-students are welcome to browse our conversations.)

Looking to learn something new?

Treehouse offers a seven day free trial for new students. Get access to thousands of hours of content and a supportive community. Start your free trial today.

JavaScript

Kostas Koutoupis
Kostas Koutoupis
15,010 Points

Pass multiple objects to an array - Javascript

Code for the following question available at https://repl.it/@kkoutoup/ParliamentUKscraper

Question I'm trying to create a web scraper with node, request and cheerio. The scraper works in two phases (so far):

Phase One The first function scrapeCommitteesPage() visits a page, scrapes all links, adds them to an array, and passes the array to a second function called getUniqueIDs()

//dependencies
const request = require('request');
const cheerio = require('cheerio');

//uk parliament - select committees page
const committeesListUrl = 'https://bit.ly/2YXZBC7';

//get all links to committee pages
scrapeCommitteesPage=()=>{  
  request(committeesListUrl, (error, response, body)=>{
    if(!error && response.statusCode == 200){
      //pass response body to cheerio
      const $ = cheerio.load(body),
      //get all committee page links - ul.square-bullets-a-to-z li a
            committeeLinksArray = [],
            linkList = $('.square-bullets-a-to-z a'),
            parliamentUrl = 'https://www.parliament.uk';
      //push all links to array
      for(let i=0;i<linkList.length;i++){
        if(linkList[i]){
          committeeLinksArray.push(`https://www.parliament.uk${linkList[i].attribs.href}`);
        }
      }
    //pass array to uniqueIDs function
    getUniqueIDs(committeeLinksArray);  
    }else{
      console.log(error)
    }//end 1st request else
  })//end request
}

Phase Two The second function getUniqueIDs() loops through the array of links, passed as an argument from the first function, and saves information (name, id, url) into a committeeDetails object.

//get unique IDs
getUniqueIDs=(committeeLinksArray)=>{
  for(let i=0;i<committeeLinksArray.length;i++){
    if(committeeLinksArray[i]){
      request(committeeLinksArray[i], (error, response, body)=>{
        if(!error && response.statusCode == 200){
          //pass response body to cheerio
          const $ = cheerio.load(body),
                committeeDetails = [],
                //save committee name, id and url for rss feed
                committeeName = $("meta[property='og:title']").attr("content"),
                uniqueID = $("meta[name='search:cmsPageInstanceId']").attr("content"),
                committeeRSSUrl = `https://www.parliament.uk/g/rss/committee-feed/?pageInstanceId=${uniqueID}&type=Committee_Detail_Mixed`;
          //push details object to committeeDetails array      
          committeeDetails.push({
            'committee-name': committeeName,
            'committee-ID': uniqueID,
            'committee-RSS-URL': committeeRSSUrl
          })
          //pass details to visitCommitteePage function
          visitCommitteePage(committeeDetails)
        }else{
          console.log(error);
        }
      }) 
    }
  }//end for loop 
}

When I'm trying to console.log the object just to make sure all the information is there I get single arrays with the object properties but what i need is a single array of objects that will later be manipulated in a third function visitCommitteePage().

So the output I get looks like:

[ { 'committee-name': 'Defence Sub-Committee',
    'committee-ID': '105517',
    'committee-RSS-URL':
     'https://www.parliament.uk/g/rss/committee-feed/?pageInstanceId=105517&type=Committee_Detail_Mixed' } ]
[ { 'committee-name': 'Business, Energy and Industrial Strategy Committee',
    'committee-ID': '115803',
    'committee-RSS-URL':
     'https://www.parliament.uk/g/rss/committee-feed/?pageInstanceId=115803&type=Committee_Detail_Mixed' } ]

whereas what I want is an array of objects that would look like this:

[ { 'committee-name': 'Defence Sub-Committee',
    'committee-ID': '105517',
    'committee-RSS-URL':
     'https://www.parliament.uk/g/rss/committee-feed/?pageInstanceId=105517&type=Committee_Detail_Mixed' },
   { 'committee-name': 'Business, Energy and Industrial Strategy Committee',
    'committee-ID': '115803',
    'committee-RSS-URL':
     'https://www.parliament.uk/g/rss/committee-feed/?pageInstanceId=115803&type=Committee_Detail_Mixed' } 
]

How can I achieve that?

Apologies for the long question.

Code available at https://repl.it/@kkoutoup/ParliamentUKscraper

1 Answer

Steven Parker
Steven Parker
216,014 Points

In getUniqueIDs, the loop that goes through the committeeLinksArray clears the committeeDetails array, pushes an object into it, and then calls visitCommitteePagefor each item.

From your description, it sounds like you want to clear committeeDetails only once before the loop starts, and then call visitCommitteePage after the loop ends and all the objects have been added to the array. This would make it work more like what scrapeCommitteesPage does.

So with that in mind, a little re-arrangement of the statements should get it working as you intended.