An Intelligent Spider5:40 with Ken Alger
Forms are a big part of many websites. Scrapy provides a FormRequest class for handling them.
[MUSIC] 0:00 We've managed to make a couple of spiders that were great for 0:04 sites that don't require interaction. 0:07 But many sites do indeed require some sort of interaction. 0:10 For example, logging in to a site with a username and 0:13 password requires a form submission. 0:17 There are many different reasons for 0:20 needing to work with forms when getting and scraping data. 0:21 Let's head back into our code to take a look at some techniques. 0:25 Our Horse Land site is hosted on GitHub pages, 0:30 which doesn't support backend technologies. 0:34 So we'll be using a bit of a workaround from Formspree to handle the form posts. 0:37 Check the teacher's notes for additional information about formspree.io and 0:43 how to get started with that. 0:47 If we'd look at our form page, we see that it's a pretty simple form 0:48 with just a first name, last name, and a job title. 0:53 Scrapy has a class called form request, which allows for form processing. 0:57 And, hold your horses, it's easy to use. 1:02 Let's mosey on over to our code and create a new spider. 1:05 So I'll create a new file, gonna be a Python file, and we'll call it formSpider. 1:10 The first form request will need to be imported. 1:18 So from scrapy.http import FormRequest. 1:21 And we need to import spider. 1:28 Scrapy.spiders. 1:29 import Spider. 1:32 We need to create a new class that inherits from Spider as our next step. 1:37 Call it FormSpider and, as we've seen, we need to give our Spider a name. 1:43 We'll just call it horseForm. 1:51 And we define our start URL. 1:55 Which again, is a list. 2:00 What's the URL for our form? 2:02 We'll just cut and paste that in. 2:06 This looks pretty familiar this far, I think. 2:11 Next we define our parse method and we'll define the formdata we want to pass in. 2:14 So define parse and formdata. 2:21 Let's go use the developer tools in the browser to see what the form 2:25 fields are called. 2:28 Come over here, Developer Tools. 2:30 So they're down in here in this form. 2:36 So we have firstname, Lastname, And jobtitle. 2:46 All lower case and no spaces. 2:54 So we want firstname. 2:58 My first name is Ken. 3:00 Lastname, Alger. 3:04 And jobtitle is Teacher. 3:09 Now we need to return a form request from response object. 3:15 So return FormRequest.from_response. 3:19 We'll return the response, the form number on the page we're processing, 3:26 and that's zero based, formnumber, and then the form data we want. 3:33 So formdata = formdata. 3:40 And then a callback for what to do next. 3:45 So callback. 3:48 We'll make a method here called after_post. 3:51 This passes the data we defined into the form and, 3:55 by default, utilizes the submit button to submit our data. 3:59 Then it will do whatever we define in the after_post method. 4:04 Here we could do data saving or data processing or further scraping tasks. 4:08 For now, let's just print out that the form was processed and 4:14 the response object itself. 4:19 So we'll define after_post, self, and again, that takes a response. 4:21 We'll print and we'll do 4:26 a little formatting, just so 4:31 we can see it in the terminal. 4:36 And we'll print the response. 4:41 Let's just copy this line here. 4:45 There we go. 4:50 And we can, all right, let's open a Terminal window, 4:51 Go to our Spiders folder, And have Scrapy run our crawler. 5:00 We look up here. 5:13 Great, we see that the spider found and submitted our form. 5:16 In our case here, it was posted to formspree.io for processing. 5:19 Here's our printed information and our 200 response code. 5:24 Great, I've included links in the teacher's notes 5:27 about form request as well. 5:30 I'd encourage you to look at it as it is a powerful tool for 5:32 processing forms and can even be used to handle login forms. 5:35
You need to sign up for Treehouse in order to download course files.Sign up