Welcome to the Treehouse Community

The Treehouse Community is a meeting place for developers, designers, and programmers of all backgrounds and skill levels to get support. Collaborate here on code errors or bugs that you need feedback on, or asking for an extra set of eyes on your latest project. Join thousands of Treehouse students and alumni in the community today. (Note: Only Treehouse students can comment or ask questions, but non-students are welcome to browse our conversations.)

Looking to learn something new?

Treehouse offers a seven day free trial for new students. Get access to thousands of hours of content and a supportive community. Start your free trial today.

Python Scraping Data From the Web Additional Scraping Tasks An Intelligent Spider

Cody Stephenson
Cody Stephenson
8,272 Points

I think the form is broken (needs updating on Treehouse's side).

I tried a bunch of troubleshooting before actually trying to submit the form manually. When I did that I get "This form isn't set up yet Do you own this website?

If so, please login and create a form. Then update your HTML or JavaScript with the new form endpoint. More information here."

Chris Freeman
Chris Freeman
Treehouse Moderator 68,031 Points

Where did you get stuck? I was able to use the code from the video:

# setting up my venv
$  mkvirtualenv scrapy_tth
$  pip install scrapy
$  mkdir scraping_data
$  cd scraping_data/
# TTH material
$  scrapy startproject tthscrape
$  cd tthscrape/
$  cd tthscrape/spiders
## create formSpider.py
$  cd ../../
$  scrapy crawl horseForm
formSpider.py
from scrapy.http import FormRequest
from scrapy.spiders import Spider


class FormSpider(Spider):

    name = 'horseForm'

    start_urls = ['https://treehouse-projects.github.io/horse-land/form.html']

    def parse(self, response):
        formdata = {'firstname': 'chris',
                    'lastname': 'freeman',
                    'jobtitle': 'student',
                    }
        return FormRequest.from_response(response, formnumber=0,
                                         formdata=formdata,
                                         callback=self.after_post)

    def after_post(self, response):
        print("\n**********\n\nForm processed\n\n**********\n\n")
        print(response)
Cody Stephenson
Cody Stephenson
8,272 Points

When I run scrapy crawl horseForm I get a massive output without any of the expected portions and I think the relevant error is

2021-06-29 15:03:59 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://treehouse-projects.github.io/horse-land/form.html> (referer: None)
2021-06-29 15:04:00 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://formspree.io/robots.txt> (referer: None)
2021-06-29 15:04:01 [scrapy.core.engine] DEBUG: Crawled (404) <POST https://formspree.io/content+scrapy@teamtreehouse.com> (referer: https://treehouse-projects.github.io/horse-land/form.html)
2021-06-29 15:04:01 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://formspree.io/content+scrapy@teamtreehouse.com>: HTTP status code is not handled or not allowed
2021-06-29 15:04:01 [scrapy.core.engine] INFO: Closing spider (finished)

but there are a few hundred lines more that I don't think are relevant. Is it expected that I would get the error when I enter information in the form manually?

2 Answers

Chris Freeman
MOD
Chris Freeman
Treehouse Moderator 68,031 Points

Hey Cody Stephenson, I have to agree that the form is broken (or at least the setup is incomplete at formspree.io). Using the form on the website https://treehouse-projects.github.io/horse-land/form directly yields "This form isn't set up yet". Sad Panda

I hacked the form.html page manually revert the change below to the form action shown in the github history:

-        <form action="https://formspree.io/ken.alger+scrapy@teamtreehouse.com" method="POST">
+        <form action="https://formspree.io/content+scrapy@teamtreehouse.com" method="POST">

src: https://github.com/treehouse-projects/horse-land/commit/d559009a92a8de48ba7f2a62483fbd38060324ce#diff-c684eb10223c1ee8f91719ff3ea8b5756b5841ceb1111df54a52281e4d9a4174

When using "ken.alger" vs "content" manually as the form action, formspree.io comes back a valid response

Tagging Ken Alger, Does the form "content+scrapy@teamtreehouse.com" need to be set up?

David Sampimon
David Sampimon
12,024 Points

Bump: just leaving a comment here that I am running into the same issue in May 2022.

2022-05-06 12:49:47 [scrapy.core.engine] DEBUG: Crawled (404) <POST https://formspree.io/content+scrapy@teamtreehouse.com> (referer: https://treehouse-projects.github.io/horse-land/form.html)
2022-05-06 12:49:47 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://formspree.io/content+scrapy@teamtreehouse.com>: HTTP status code is not handled or not allowed
2022-05-06 12:49:47 [scrapy.core.engine] INFO: Closing spider (finished)
2022-05-06 12:49:47 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
Chris Freeman
Chris Freeman
Treehouse Moderator 68,031 Points

It might help escalate the issue if you send a link to this forum page to help@teamtreehouse.com