Welcome to the Treehouse Community

Want to collaborate on code errors? Have bugs you need feedback on? Looking for an extra set of eyes on your latest project? Get support with fellow developers, designers, and programmers of all backgrounds and skill levels here with the Treehouse Community! While you're at it, check out some resources Treehouse students have shared here.

Looking to learn something new?

Treehouse offers a seven day free trial for new students. Get access to thousands of hours of content and join thousands of Treehouse students and alumni in the community today.

Start your free trial

Python Scraping Data From the Web Additional Scraping Tasks An Intelligent Spider

BRIAN WEBER
BRIAN WEBER
21,570 Points

Getting a 401 response when submitting form - Scrapy FormSpider

Here is the output from the scrapy log:

2019-04-19 08:17:14 [scrapy.extensions.telnet] DEBUG: Telnet console listening on 127.0.0.1:6023
2019-04-19 08:17:14 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://treehouse-projects.github.io
/robots.txt> (referer: None)
2019-04-19 08:17:14 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://treehouse-projects.github.io
/horse-land/form.html> (referer: None)
2019-04-19 08:17:15 [scrapy.core.engine] DEBUG: Crawled (405) <GET https://formspree.io/robots.txt> (re
ferer: None)
2019-04-19 08:17:15 [scrapy.core.engine] DEBUG: Crawled (401) <POST https://formspree.io/content+scrapy
@teamtreehouse.com> (referer: https://treehouse-projects.github.io/horse-land/form.html)
2019-04-19 08:17:15 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <401 https://formspree
.io/content+scrapy@teamtreehouse.com>: HTTP status code is not handled or not allowed
2019-04-19 08:17:15 [scrapy.core.engine] INFO: Closing spider (finished)

Here is my code:

from scrapy.http import FormRequest
from scrapy.spiders import Spider


class FormSpider(Spider):

    name = 'horseForm'

    start_urls = ['https://treehouse-projects.github.io/horse-land/form.html']

    def parse(self, response):
        formdata = {'firstname': 'Brian',
                    'lastname': 'Weber',
                    'jobtitle': 'Developer'}
        return FormRequest.from_response(response, formnumber=0,
                                         formdata=formdata,
                                         callback=self.after_post)

    def after_post(self, response):
        print('\n\n*******\nForm processed.\n')
        print(response)
        print('\n******\n')

Note that when I try to submit the form manually here https://treehouse-projects.github.io/horse-land/form , it says that the form needs to be set up.

1 Answer

Noah Caldwell-Gatsos
Noah Caldwell-Gatsos
1,420 Points

Yup, that's a common error running Scrapy - has to do with the way that the website is set up.

To still run the program, add these three lines to your settings.py: HTTPERROR_ALLOWED_CODES = [404] HTTPERROR_ALLOWED_CODES = [405] HTTPERROR_ALLOWED_CODES = [401]

If you want to see more information on the error, refer to this Stack Exchange post: https://stackoverflow.com/questions/48030717/scrapy-404-error-http-status-code-is-not-handled-or-not-allowed/48031332