Reworking Net::SFTP to handle large file downloads

Posted by Matt Parrish Tue, 26 Jun 2007 19:13:00 GMT

I’m writing an application that downloads access logs from our production servers and runs the AWStats package against them to create the statistics web pages. This process is setup as a Rake task that uses the Net::SFTP library used by Capistrano, written by Jamis Buck. There is also a front-end Rails application to manage each of the applications to be retrieved. Everything was working great until I tried to grab a 550MB file from one of our servers. Net::SFTP chocked as it ran out of memory.

It turns out that the command:

    sftp.get_file log_file, local_file

ends up putting the whole file into memory, which is fine for small files, but not the large one that I was trying to download. Luckily it wasn’t too bad to refactor my class. Here’s the new code to achieve the same effect as the above sftp.get_file command.

        stat = sftp.stat( log_file )
        offset = 0
        file_length = stat.size
        length = 64 * 1024 * 1024
        File.open(local_file, File::CREAT|File::TRUNC|File::RDWR, 0644) do |f|
          while (offset < file_length)
            sftp.open_handle(log_file) do |handle|
              data = sftp.read(handle, :length => length, :offset => offset)
              f.write(data)
              offset += data.length
            end
          end
        end

This downloads the file in 64MB increments, using only that much memory at any time.

RadiantOnRails released on RubyForge

Posted by Matt Parrish Thu, 31 May 2007 19:11:00 GMT

RadiantOnRails is a Radiant extension I created to allow a Rails application to co-exist with Radiant, giving the developer the best of both (dynamic and static) worlds. You can now visit the new project page on RubyForge.

This extension will be a major piece for the website I’m currently working on, RealIdaho.com. Most of the pages for the site just display static content about cities, but there are other portions of the site that will be fully-dynamic Rails pages. This extension allows me to combine both portions of the site into one application so we can develop the Rails pages for displaying the data-driven pages and still leverage the wonderful user interface created by the Radiant team. This will allow the realtor to make content changes without me having to make code changes and redeploy.

RadiantOnRails currently allows Radiant snippets to be inserted into Rails views and the next step is to allow the Rails views to use the Radiant layouts so that the look ‘n’ feel of the site is consistent, while keeping the views DRY. I’ll also be working with Loren Johnson to make Radiant available as a plugin which will make integrating Radiant with Rails even easier.

Switching from Pound to Nginx

Posted by Matt Parrish Tue, 05 Dec 2006 20:08:00 GMT

I just switched some Ruby on Rails apps I’m running from Pound to Nginx based on the results from some articles I’ve read online. The two biggest advantages of Nginx are 1. It’s raw performance, and 2. It can serve up static files, which is great for running Capistrano’s disable_web command to show a maintenance page when redeploying an application.

At work, we’re working on a standard Ruby on Rails setup and are currently investigating two options. The first is a Mongrel cluster running behind Nginx, as I described above. The second option is fronting the Mongrel cluster with Lighttpd. Since the 1.4.x series of Lighttpd is known to have some issues with it’s mod_proxy implementation, we would use Pen until a stable 1.5 version is released.

I’ll post another article once we have finished our evaluation and chosen which option we’re going to deploy at work. Stay tuned…

Switched to pure ruby ldap library

Posted by Matt Parrish Mon, 20 Nov 2006 19:58:00 GMT

I wrote an article awhile back about using the Ruby/LDAP library to handle LDAP authentication in Ruby on Rails. I just finished swapping out the LDAP client library in that application from Ruby/LDAP to ruby-net-ldap. The problems with Ruby/LDAP are that it isn’t a GEM, so installation is a bit more difficult, and it relies on a common LDAP library, like OpenLDAP, to already be installed on the system. The ruby-net-ldap library is written in pure Ruby, so no other library needs to be installed on the system.

Here is the new code that performs the authentication:

require "net/ldap" 

class User < ActiveRecord::Base
  def self.authenticate(login, password, host, port)
    if login.to_s.length > 0 and password.to_s.length > 0
      ldap = Net::LDAP.new
      ldap.host = host
      ldap.port = port
      ldap.auth = "cn=#{login},cn=users,o=xyz...", password
      if ldap.bin
        return find(:first, :conditions => ['username=?', login])
      else
        return false
      end
    end
  end
end

ActiveMailer email performance on site5

Posted by Matt Parrish Thu, 09 Nov 2006 19:52:00 GMT

The site I am working on, RealIdaho.com, is hosted by site5. There are some forms that, when filled out by the user, send confirmation emails to the user, to us internally, and to the realtor’s cellphone. We started getting some reports that the confirmation page wouldn’t come up, and users were resubmitting the form many times. I discovered a few things while investigating that may be useful to others.

The first thing I noticed, is that the page would timeout after 15 seconds and just return a blank page. Not very friendly, and definitely explains why the users were unsure that the form was submitted properly. The site however continued to process the request, and eventually got all the emails out. I contacted the site5 support team and asked the 1) why is the page timing out after only 15 seconds, and 2) why was sending email so slow.

Here’s the answer I got for item 1:

“Changing the timeout would affect the global settings for apache. This results in a significant increase in the number of open connections that apache has and degrades server performance. I can ask about this but I don’t think it will be possible to increase the timeout.”

While I understand their reasoning, 15 seconds seems awfully short. I believe the default for apache is 5 minutes. Hopefully, they’ll consider increasing that amount at least to 30-60 seconds. While I never want a page that takes more than 5 seconds, I’m sure we’ll have some pages that do take longer, especially ones that need to communicate with external systems.

Regarding number 2, the support technician suggested using sendmail directly instead of sending email over SMTP, as that should be much faster. So, I went into RAILS_ROOT/config/environments/production.rb and added the following line:

ActiveMailer::Base.delivery_method = :sendmail

Here are the performance numbers:

Previous method (SMTP):

Sending Mail to user (25.22760)
Sending Mail to internal (31.11207)
Sending Mail notification (20.70688)

Current method (Sendmail):

Sending Mail to user (0.16841)
Sending Mail to internal (0.28479)
Sending Mail notification (0.24701)

Wow, that’s quite a difference! When I tested the form submission, the confirmation page appeared almost immediately. Now that’s more like it. Now I just need to see about increasing the timeout.

Error working with large YAML files

Posted by Matt Parrish Sun, 13 Aug 2006 18:34:00 GMT

As part of my Application migration project, I need to pre-populate the new database with zip code data. The Rails Recipes book (very useful) has a nice recipe on extracting fixtures from live data. So, I’ve created a zips.yml file that contains all the zip code data that I can insert into the new database. However, when I try to load the fixture using this very cool rake task from Technoweenie, the YAML library throws the following exception: SystemStackError: stack level too deep.

It is possible to work-around the error by increasing the stack limit on the command-line. on Mac OSX (and probably Linux/Unix), the following command can be run before running the rake task: ulimit -s 32768. This increases the default stack limit to 32MB, which should be enough, unless the yaml file is really large, I suppose.

Does anyone know if somebody is working on fixing the YAML library to be nicer to the stack?

Rails Migrations - Dropping default constraint on column

Posted by Matt Parrish Sun, 13 Aug 2006 18:31:00 GMT

I’m working on a project for a client to convert a website from PHP to Ruby on Rails. The database is also changing from Microsoft SQL Server to MySQL, along with several schema redesigns.

One thing I wanted to do to our new db, was change a foreign key column from NOT NULL DEFAULT 0 to allow nulls and remove the default. Currently, the organizations table has a record with ID = 0 to signify a blank organization. Users that do not belong to an organization, have a foreign key to that Zero-record. Well, a better way to represent that is for the foreign key column in users to simply be NULL and ditch the Zero-record in the organizations table.

I have created a Migration class to handle the schema change. However, it looks like I have to drop to SQL as there is not Migration way of doing this. Here’s what it looks like:

    execute 'ALTER TABLE users ALTER organization_id DROP DEFAULT'

I wanted to be able to handle this in the Migration-syntax, but I don’t think that’s possible. It would be nice to do something like:

    change_column :users, :organization_id, :integer, :default => :null, :null => true

Rails redeploy issue resolved

Posted by Matt Parrish Sun, 06 Aug 2006 16:25:00 GMT

I just redeployed a rails site for a client of mine, Real Idaho, using Capistrano and it didn’t work properly. I was able to resolve the issue, and here are the details in case this ever happens to you…

Running

$ rake deploy

gave the following output (some content suppressed):

...
transaction: commit
  * executing task restart
  * executing "/home/*****/apps/*****/current/script/process/reaper -d 'dispatch.fcgi'" 
    servers: ["216.118.83.207"]
    [216.118.83.207] executing command
 ** [out :: 216.118.83.207] bash: /home/*****/apps/*****/current/script/process/reaper: Permission denied
    command finished
rake aborted!
command "/home/******/apps/******/current/script/process/reaper -d 'dispatch.fcgi'" failed on 216.***.***.***

The problem is that the files in script/ and the dispatch.* files in public need to be executable, but when Capistrano pulls down the latest subversion code, it defaults those files to the permissions:

-rw-r--r--

which is just read/write.

In order to have subversion pull down the files with the correct, executable, permissions, you must run the following command on each executable file and commit them to subversion:

$ svn propset svn:executable

I found the following article, Subversion Primer for Rails projects, which has a nice script to automate the task of setting the necessary files as executable. Here’s the script:

$ svn propset svn:executable "*" `find script -type f | grep -v '.svn'` public/dispatch.*

Then, run

$ svn commit

to commit those changes to Subversion.

I think this should resolve the issue, but I haven’t redeployed yet, so I haven’t verified that this will really fix my deployment issue.