Welcome to the Treehouse Community

The Treehouse Community is a meeting place for developers, designers, and programmers of all backgrounds and skill levels to get support. Collaborate here on code errors or bugs that you need feedback on, or asking for an extra set of eyes on your latest project. Join thousands of Treehouse students and alumni in the community today. (Note: Only Treehouse students can comment or ask questions, but non-students are welcome to browse our conversations.)

Looking to learn something new?

Treehouse offers a seven day free trial for new students. Get access to thousands of hours of content and a supportive community. Start your free trial today.

Python Scraping Data From the Web A World Full of Spiders Crawling Spiders

Jonathan Kuhl
Jonathan Kuhl
26,133 Points

Not getting any data at all from spider

I've been following the video, I made my spider, I made almost no changes to the code and I got no results from the spider. I got a report saying zero webpages were crawled, despite the urls being copied and pasted from the webpages Treehouse provided:

import scrapy

class HorseSpider(scrapy.Spider):
    name = 'ike'

    def start_request(self):
        urls = [
            'https://treehouse-projects.github.io/horse-land/index.html',
            'https://treehouse-projects.github.io/horse-land/mustang.html'
        ]
        return [scrapy.Request(url=url, callback=self.parse) for url in urls]

    def parse(self, response):
        url = response.url
        page = url.split('/')[-1]
        filename = 'horses-%s' % page
        print('URL: {}'.format(url))
        with open(filename, 'wb') as file:
            file.write(response.body)
        print('Saved as %s' % filename)

What am I missing?

2 Answers

That's just a convention. Yes, you can call it start_request or start_requests but you do have to be consistent. Good pointing out this detail though, Trevor!

Actually now that I've reached the end of the video, and @kenalger says "we need start_requests and parse, now I'm not sure...

jessechapman
jessechapman
4,088 Points

I make the same error as the initial poster (defining a "start_request" function). Changing it to start_requests worked for me.