Python Scraping Data From the Web Additional Scraping Tasks An Intelligent Spider

Getting a 401 response when submitting form - Scrapy FormSpider

Here is the output from the scrapy log:

2019-04-19 08:17:14 [scrapy.extensions.telnet] DEBUG: Telnet console listening on
2019-04-19 08:17:14 [scrapy.core.engine] DEBUG: Crawled (404) <GET
/robots.txt> (referer: None)
2019-04-19 08:17:14 [scrapy.core.engine] DEBUG: Crawled (200) <GET
/horse-land/form.html> (referer: None)
2019-04-19 08:17:15 [scrapy.core.engine] DEBUG: Crawled (405) <GET> (re
ferer: None)
2019-04-19 08:17:15 [scrapy.core.engine] DEBUG: Crawled (401) <POST> (referer:
2019-04-19 08:17:15 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <401 https://formspree
.io/>: HTTP status code is not handled or not allowed
2019-04-19 08:17:15 [scrapy.core.engine] INFO: Closing spider (finished)

Here is my code:

from scrapy.http import FormRequest
from scrapy.spiders import Spider

class FormSpider(Spider):

    name = 'horseForm'

    start_urls = ['']

    def parse(self, response):
        formdata = {'firstname': 'Brian',
                    'lastname': 'Weber',
                    'jobtitle': 'Developer'}
        return FormRequest.from_response(response, formnumber=0,

    def after_post(self, response):
        print('\n\n*******\nForm processed.\n')

Note that when I try to submit the form manually here , it says that the form needs to be set up.

1 Answer

Noah Caldwell-Gatsos
Noah Caldwell-Gatsos
1,420 Points

Yup, that's a common error running Scrapy - has to do with the way that the website is set up.

To still run the program, add these three lines to your HTTPERROR_ALLOWED_CODES = [404] HTTPERROR_ALLOWED_CODES = [405] HTTPERROR_ALLOWED_CODES = [401]

If you want to see more information on the error, refer to this Stack Exchange post: