Skip to content
boooz edited this page May 17, 2012 · 2 revisions

Bellow you can find example of including skyscraper:

class Sample
  include Skyscraper

  settings limit: 10, delay: { after: 5, time: 1 }, encoding: "utf-8"

  pages ["http://google.com", "https://github.com", "http://rubyonrails.org"]
  # pages method also accepts blocks as argument, then you can use Skyscraper::fetch method inside to get list of pages from website more dynamicly

  field :html, "html", :html
  field :title, "title" do |node|
    "'#{node.text}'"
  end
  field :first_link, "body" do |node|
    "'#{node.first("a").href}'"
  end
  # field method takes following arguments: 
  # field_name => name that the record will have in the results table
  # selector => css selector of fetching element, so it can even looks like "tag #id.some_class"
  # optionaly symbol with the node method or block, if nothing is provided, text method on the node is fired

  after_each do |result|
    page = Page.new 
    page.title       = result[:title]
    page.html        = result[:html]
    page.first_link  = result[:first_link]
    page.save
  end

  after_all do 
    puts "Job done"
  end
end

Sample.new.fetch #this will run above code applying provided callbacks and returns array with results

After including Skyscraper to the class, it gets followed methods:

pages(string|array|block)

Pages method sets list of pages that will be visited by skyscraper.

It accepts strings, arrays and blocks as the arguments. Block should returns an array or string with the list of the pages. You don't have to care about flatting returned array, it is happen automatically.

field(name, selector, attribute_name|block|nil)

Field method adds provided field to skyscraper results array.

It accepts followed arguments:

  • name - name of field in skyscraper results array
  • selector - first matches css selector will be fetched
  • attribute_name|block|nil - you can provide symbol with attribute name (for e.g :html, :text, :download), block which gets two arguements: matches node element, and current page object, if nothing is provided it will lanuch :text method on node by default

after_all

after_each

settings

fetch

continue

Clone this wiki locally