Python Scraping Data From the Web Additional Scraping Tasks An Intelligent Spider

BRIAN WEBER
BRIAN WEBER
19,793 Points

Getting a 401 response when submitting form - Scrapy FormSpider

Here is the output from the scrapy log:

2019-04-19 08:17:14 [scrapy.extensions.telnet] DEBUG: Telnet console listening on 127.0.0.1:6023
2019-04-19 08:17:14 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://treehouse-projects.github.io
/robots.txt> (referer: None)
2019-04-19 08:17:14 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://treehouse-projects.github.io
/horse-land/form.html> (referer: None)
2019-04-19 08:17:15 [scrapy.core.engine] DEBUG: Crawled (405) <GET https://formspree.io/robots.txt> (re
ferer: None)
2019-04-19 08:17:15 [scrapy.core.engine] DEBUG: Crawled (401) <POST https://formspree.io/content+scrapy
@teamtreehouse.com> (referer: https://treehouse-projects.github.io/horse-land/form.html)
2019-04-19 08:17:15 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <401 https://formspree
.io/content+scrapy@teamtreehouse.com>: HTTP status code is not handled or not allowed
2019-04-19 08:17:15 [scrapy.core.engine] INFO: Closing spider (finished)

Here is my code:

from scrapy.http import FormRequest
from scrapy.spiders import Spider


class FormSpider(Spider):

    name = 'horseForm'

    start_urls = ['https://treehouse-projects.github.io/horse-land/form.html']

    def parse(self, response):
        formdata = {'firstname': 'Brian',
                    'lastname': 'Weber',
                    'jobtitle': 'Developer'}
        return FormRequest.from_response(response, formnumber=0,
                                         formdata=formdata,
                                         callback=self.after_post)

    def after_post(self, response):
        print('\n\n*******\nForm processed.\n')
        print(response)
        print('\n******\n')

Note that when I try to submit the form manually here https://treehouse-projects.github.io/horse-land/form , it says that the form needs to be set up.

1 Answer

Noah Caldwell-Gatsos
Noah Caldwell-Gatsos
1,420 Points

Yup, that's a common error running Scrapy - has to do with the way that the website is set up.

To still run the program, add these three lines to your settings.py: HTTPERROR_ALLOWED_CODES = [404] HTTPERROR_ALLOWED_CODES = [405] HTTPERROR_ALLOWED_CODES = [401]

If you want to see more information on the error, refer to this Stack Exchange post: https://stackoverflow.com/questions/48030717/scrapy-404-error-http-status-code-is-not-handled-or-not-allowed/48031332