Python Scraping Data From the Web A World Full of Spiders Crawling Spiders

Jonathan Kuhl
Jonathan Kuhl
26,130 Points

Not getting any data at all from spider

I've been following the video, I made my spider, I made almost no changes to the code and I got no results from the spider. I got a report saying zero webpages were crawled, despite the urls being copied and pasted from the webpages Treehouse provided:

import scrapy

class HorseSpider(scrapy.Spider):
    name = 'ike'

    def start_request(self):
        urls = [
            'https://treehouse-projects.github.io/horse-land/index.html',
            'https://treehouse-projects.github.io/horse-land/mustang.html'
        ]
        return [scrapy.Request(url=url, callback=self.parse) for url in urls]

    def parse(self, response):
        url = response.url
        page = url.split('/')[-1]
        filename = 'horses-%s' % page
        print('URL: {}'.format(url))
        with open(filename, 'wb') as file:
            file.write(response.body)
        print('Saved as %s' % filename)

What am I missing?

1 Answer

Trevor J
Trevor J
Python Web Development Techdegree Student 2,078 Points

Check def start_request(self): it should be def start_requests(self):

That's just a convention. Yes, you can call it start_request or start_requests but you do have to be consistent. Good pointing out this detail though, Trevor!

Actually now that I've reached the end of the video, and @kenalger says "we need start_requests and parse, now I'm not sure...