Welcome to the Treehouse Community

The Treehouse Community is a meeting place for developers, designers, and programmers of all backgrounds and skill levels to get support. Collaborate here on code errors or bugs that you need feedback on, or asking for an extra set of eyes on your latest project. Join thousands of Treehouse students and alumni in the community today. (Note: Only Treehouse students can comment or ask questions, but non-students are welcome to browse our conversations.)

Looking to learn something new?

Treehouse offers a seven day free trial for new students. Get access to thousands of hours of content and a supportive community. Start your free trial today.

Ruby

Andrew Cottage
Andrew Cottage
20,536 Points

I'm stuck on how to integrate a nokogiri scrape into my rails application.

Here is my issue.

I wrote a nokogiri scrape in ruby that simply scrapes and puts the results on the screen.

require 'nokogiri'
require 'open-uri'
require 'rubygems'

url = 'http://disneyauditions.com/audition-calendar/'
data = Nokogiri::HTML(open(url))
auditions = data.css('.audtion')


auditions.each do |audition|
    puts audition.css('.name').text
    puts audition.css('.businessunit').text 
    puts audition.css('.location').text
    puts audition.css('.venue').text
    puts audition.css('.talent_type').text
    puts audition.css('.start_date').text
    puts audition.css('.start_time').text
    puts audition.css('.time_zone').text
end

I was able to implement it and display it by putting half the code in my controller, and the other half in the view.

Controller

    def list

    require 'nokogiri'
    require 'open-uri'
    require 'rubygems'


    url = 'http://disneyauditions.com/audition-calendar/'
    data = Nokogiri::HTML(open(url))
    @auditions = data.css('.audtion')

    end

View

<div class="col-md-8 col-md-offset-2">
  <table class="table table-striped">
    <thead>
      <tr>
        <th>Resort</th>
        <th>Type</th>
        <th>Venue</th> 
        <th>Location</th>
        <th>Date</th>
        <th>Time</th>
        <th>Zone</th>
      </tr>
    </thead>
    <% @auditions.each do |a| %>
      <% unit = a.css('.businessunit').text %>
      <% location = a.css('.location').text %>
      <% venue = a.css('.venue').text %>
      <% type = a.css('.talent_type').text %>
      <% date = a.css('.start_date').text %>
      <% time = a.css('.start_time').text %>
      <% time_zone = a.css('.time_zone').text %>
    <tbody>    
      <tr>
        <td><%= unit %></td>
        <td><%= type %></td>
        <td><%= venue %></td> 
        <td><%= location %></td> 
        <td><%= date %></td>
        <td><%= time %></td>
        <td><%= time_zone %></td> 
      </tr>
    <%  end %>
    </tbody>
  </table>
</div>

I'm truly stuck on this next part.

I would like to scrape the data, and put it into the database. I would then like to in my view display the data by simply calling from the database.

As my site is set up now every time I visit the page it scrapes the website.

Questions: Where should I put the code in my rails app to scrape the website? How do I edit my code to insert scraped data into the database? How do I schedule my scrape to run at a certain time daily? How do I display my scraped data in my view once it's in the database?

Thanks for all of your help in advance. I have spent hours searching the web trying to figure this out.

I'm feeling so defeated ARGH!

6 Answers

Nick Fuller
Nick Fuller
9,027 Points

Wooooah there Andrew!

First of all this is fun stuff you're doing and that's quite a bit of a loaded post you have here. But I love what you're doing it sounds fun. You have four primary questions here and they aren't small.

a. Where should I put the code in my rails app to scrape the website?

First, I suggest looking at this book. I have no affiliation with the author or the publisher, but it's simply an amazing book that will really help a person of your skill set reach the next level. http://www.poodr.com/

Next, this sounds like business logic right? What is a model? It's an object, that controls business logic! Rails usually associates a model as an object which speaks to your database, which is true, but you can create models that don't deal with your database at all!

For instance... what if you created a file called disney_scraper.rb and put it in your models directory. (FYI I haven't tested this but just trying to demonstrate)

require 'nokogiri'
require 'open-uri'

class DisneyScraper

  attr_reader :url, :data

  def initialize(url)
    @url = url
  end

  def get_class_items(class)
    data.css(class)
  end

  def data
    @data ||= Nokogiri::HTML(open(url))
  end

end

With this, in your controller you can now do something like

def list
  @disney_scrape = DisneyScraper.new('http://disneyauditions.com/audition-calendar/')
  @auditions = @disney_scrape.get_class_items('.audition')
end

b. How do I edit my code to insert scraped data into the database?

This still doesn't save them to the d/b but you can create a model in rails called... DisneyAuditions and then in your DisneyScraper object you can work with the DisneyAuditions model to save your values. I'm kind of just spouting stuff out here because you're really getting into some fun design concepts and with Ruby there are lots of ways to do things!

c. How do I schedule my scrape to run at a certain time daily?

This is also fun! Check out DelayedJobs!

https://github.com/collectiveidea/delayed_job

and from the immortal Ryan Bates:

http://railscasts.com/episodes/171-delayed-job-revised

d. How do I display my scraped data in my view once it's in the database?

If you end up creating an ActiveRecord model to save the data into your d/b, well you can use the same object to pull the data out! Just like any other rails model :)

I hope this helps! It's a big project man, keep at it, this sounds fun!

Andrew Cottage
Andrew Cottage
20,536 Points

Thanks Nick Fuller. I haven't had time to implement any of your suggestions yet, but I will be trying it this coming week. It surely is fun and will be a massive undertaking to get it up to the state that I want it to be. I appreciate your help and will update this ticket when I reach a solution.

Andrew Cottage
Andrew Cottage
20,536 Points

After trying Nick Fuller's suggestions I still have not been able to figure this out. I think that my issue stems from not having a fundamental understanding of how rails relates Models, Views and Controllers.

I tried the following:

To start I currently have: Controller

class AuditionsController < ApplicationController


    def list

        @disney_audition_scrape = Audition.new('http://disneyauditions.com/audition-calendar/')
        @disney_auditions = @disney_audition_scrape.get_class_items('.audition')
    end
end

Model:

class Audition < ActiveRecord::Base
    require 'nokogiri'
    require 'open-uri'
    require 'rubygems'

    attr_reader :url, :data, :selector

    def initialize(url)
        @url = url
    end

    def get_class_items(selector)
        data.css(selector)
    end

    def data
        @data = Nokogiri::HTML(open(url))
    end
end

View:

<div class="col-md-8 col-md-offset-2">
  <table class="table table-striped">
    <thead>
      <tr>
        <th>Resort</th>
        <th>Type</th>
        <th>Venue</th> 
        <th>Location</th>
        <th>Date</th>
        <th>Time</th>
        <th>Zone</th>
      </tr>
    </thead>
    <% @disney_auditons.each do |a| %>
      <% unit = a.css('.businessunit').text %>
      <% location = a.css('.location').text %>
      <% venue = a.css('.venue').text %>
      <% type = a.css('.talent_type').text %>
      <% date = a.css('.start_date').text %>
      <% time = a.css('.start_time').text %>
      <% time_zone = a.css('.time_zone').text %>
    <tbody>    
      <tr>
        <td><%= unit %></td>
        <td><%= type %></td>
        <td><%= venue %></td> 
        <td><%= location %></td> 
        <td><%= date %></td>
        <td><%= time %></td>
        <td><%= time_zone %></td> 
      </tr>
    <%  end %>
    </tbody>
  </table>
</div>

When I try to load the page I get the following:

Showing /home/andrew/Projects/vsrb/app/views/auditions/list.html.erb where line #15 raised:

undefined method `each' for nil:NilClass
Extracted source (around line #15):
12
13
14
15
16
17
18

           <th>Zone</th>
         </tr>
       </thead>
       <% @disney_auditons.each do |a| %>
         <% unit = a.css('.businessunit').text %>
         <% location = a.css('.location').text %>
         <% venue = a.css('.venue').text %>

Rails.root: /home/andrew/Projects/vsrb

Application Trace | Framework Trace | Full Trace
app/views/auditions/list.html.erb:15:in `_app_views_auditions_list_html_erb__3705441443227933535_12536740'
  1. How does a particular controller know what model to talk to? Does it just talk to them all?

  2. Do Models hold business logic and control communication with the database?

  3. If I create a class inside a model, from which controller can I access that?

Again any and all help is appreciated!

Nick Fuller
Nick Fuller
9,027 Points

Do you have this on github?

Doug Tucker
Doug Tucker
7,437 Points

Are you still working on this project? I would be interested in learning about what you've done if you have made any progress.

Andrew Cottage
Andrew Cottage
20,536 Points

Hello, sorry I never responded to this post. Yes I figured out how to get it integrated thanks Nick Fuller. Yes the project is on github. It's actually part of the back end that I built for my girlfriends actress site, victoriaspringer.com

github.com/lambbear/vsrb

I basically just put the logic for the scrape into a rake task, then I call that rake task with heroku scheduler and have it run every 15 mins or so.