links for 2008-02-27

Posted by Justin Weiss Wed, 27 Feb 2008 06:35:18 GMT

Bootstrapping content with Hpricot

Posted by Justin Weiss Tue, 26 Feb 2008 20:56:35 GMT

On my latest project, I discovered I had to pre-populate the project's database with existing content. Jon Udell just posted about how much of a waste of time this can be in some circumstances, but in this case, Hpricot and database migrations made it easy. This wouldn't be a solution I'd use if I needed the data as anything beyond a one-off bootstrap, but in this case it worked really well.

Hpricot, for those who don't know, is an HTML parser for Ruby that's fun to use. When I was first learning Ruby, most of the simplest yet useful projects I could come up with used Hpricot to grab content off of websites and format or combine it in different ways. Its syntax looks like this:


require 'hpricot'
require 'open-uri'

uri = URI.parse(link)
doc = Hpricot(open(uri))

name = (doc/"li.active a").inner_html
page_title = (doc/"title").inner_html
body = (doc/"#content_body").html

In this example, Hpricot is using CSS selectors to grab different pieces of content out of the page in link. The nice thing about using CSS selectors here is the code tends to be less fragile than screenscrapers that depend on the architecture of the page.

Page scraping can be a frustrating art, especially if the page layout changes or if pages are inconsistent, or have unique properties. Luckily, in this case, I only had to get it right once, and even then, I didn't have to get it completely right. I used this four-stage process:

  1. Use Hpricot to get as much data off the page and into our data structures as possible.
  2. Persist this data to the database, and make appropriate changes that Hpricot missed, or couldn't catch.
  3. Dump the database to a file, and use it to bootstrap our production database.
  4. Repeat until finished.

Rails database migrations made this relatively easy. I ended up with three migrations. The first migration created the structure of the database. The second loaded the current page data dump from the dump file. The third grabbed a few pages I still needed to parse, and I was left with data that I could tweak and dump, overwriting it with a dump containing all the page data (including the stuff I just tweaked). I could then blow away the database and repeat until I didn't have any more pages to parse.

This worked perfectly, since I didn't have to spend time getting my Hpricot parsing perfect (since I could modify the resulting data using our CMS and re-dump), and I was left with a dump of all the data that I needed in order to dynamically generate these formerly mostly static pages.

links for 2008-02-22

Posted by Justin Weiss Fri, 22 Feb 2008 06:31:01 GMT

Counterintuitive thinking

Posted by Justin Weiss Thu, 21 Feb 2008 18:15:00 GMT

Like all good programmers, I read Paul Graham's essays. One of his more recent ones got a lot of attention, especially these two quotes:

Here it is: I like to find (a) simple solutions (b) to overlooked problems (c) that actually need to be solved, and (d) deliver them as informally as possible, (e) starting with a very crude version 1, then (f) iterating rapidly.

and

So when you look at something like Reddit and think "I wish I could think of an idea like that," remember: ideas like that are all around you. But you ignore them because they look wrong.

The first is interesting because it's the same sort of principles most of the agile programmers have, but it's the second that mentions what I found insightful about the essay. There's another paragraph in the essay that explains it better:

I'd noticed, of course, that people never seemed to grasp new ideas at first. I thought it was just because most people were stupid. Now I see there's more to it than that. Like a contrarian investment fund, someone following this strategy will almost always be doing things that seem wrong to the average person.

When I read this section, I thought "yeah, but that's the point." Which is why I liked what came next:

As with contrarian investment strategies, that's exactly the point.

Something that I've been interested in lately is the counterintuitive. In a lot of cases, I've found that by doing counterintuitive things, I've had way more success than I had by following everyone else.

There are a few counterintuitive rules of thumb that are particularly interesting:

The problem with the counterintuitive is that if it is successful, it often no longer becomes counterintuitive and loses the power of originality it once had. The Red Queen is a great book about this, but I've noticed it in businesses that try to predict market trends (once a trend is identified as a trend, it's exploited until it no longer becomes a trend) and in technology (as Graham outlines). Still, looking for ways in which counterintutive thinking can lead to success will usually, if nothing else, bring you one step ahead of everyone else.

links for 2008-02-20

Posted by Justin Weiss Wed, 20 Feb 2008 06:31:51 GMT

links for 2008-02-17

Posted by Justin Weiss Sun, 17 Feb 2008 06:26:44 GMT

links for 2008-02-15

Posted by Justin Weiss Fri, 15 Feb 2008 06:31:39 GMT

links for 2008-02-14

Posted by Justin Weiss Thu, 14 Feb 2008 06:34:15 GMT

links for 2008-02-13

Posted by Justin Weiss Wed, 13 Feb 2008 06:27:41 GMT

Sidebars are better than components 2

Posted by Justin Weiss Wed, 13 Feb 2008 01:10:00 GMT

This article made it across my RSS reader today. I ran into my own problem with this while writing a custom CMS for work. We wanted to have reusable components that could be added to CMS pages, which could take various parameters, could be cached, and could be viewed in different ways given a size. I investigated Rails components at work, but noticed that using those is discouraged by the Rails community.

My investigation brought me to Typo’s sidebar model, which I used as the basis for the model we ended up using for the prototype of the project. The ultra-simplified version of the model works like this:

We have a Sidebar base class, which inherits from ActiveRecord::Base. Sidebars inherit from this Sidebar class.

Which gives us something like this:

class Sidebar < ActiveRecord::Base
  serialize :config

  class << self

    def params
       @params ||= []
    end

    def param(name, type, options = {})
      params << options.merge({:name => name, :type => type})
      self.send(:define_method, name) do 
        self.config[name] || options[:default]
      end
      self.send(:define_method, "#{name}=") do |value|
        self.config[name] = value
      end
    end
  end
end

class StaticTextSidebar < Sidebar
  param :content, :text, :default => "Hello, World!"
end

So now we have a way of defining sidebars and their parameters. The metaprogramming in the Sidebar base class allows us to programatically query the parameters declared in a Sidebar. This will be important later. For now, we still need to declare the view of a sidebar, so we do it in _static_text_sidebar.rhtml:

<%= sidebar.content %>

Now, we add a helper to application_helper.rb to render the sidebar:

def render_sidebar(sidebar)
  render :partial => sidebar.class.name.underscore, :locals => { :sidebar => sidebar }
end

and then we can call render_sidebar in any of our views on an instance of a sidebar to render it. It’s not perfect, but it’s good enough for a prototype!

From here, we have a very basic reusable model-view framework that we can include in any of our pages. Sidebar instances can be associated with content on a page to be displayed, and their configuration can be serialized to the database along with the items they display with.

Creating and configuring sidebars can be done programmatically, by generating a form based on the parameters a sidebar takes and placing that form data into the sidebar, the same way one would with a standard ActiveRecord object. Their parameters can be validated using standard Rails validations and the result of the render_sidebar call can be cached.

This basic idea, with a little bit of work, can easily form the basis for a simple reusable component architecture, and we’ve been having a ton of success with it so far.