Reworking Net::SFTP to handle large file downloads

Posted by Matt Parrish Tue, 26 Jun 2007 19:13:00 GMT

I’m writing an application that downloads access logs from our production servers and runs the AWStats package against them to create the statistics web pages. This process is setup as a Rake task that uses the Net::SFTP library used by Capistrano, written by Jamis Buck. There is also a front-end Rails application to manage each of the applications to be retrieved. Everything was working great until I tried to grab a 550MB file from one of our servers. Net::SFTP chocked as it ran out of memory.

It turns out that the command:

    sftp.get_file log_file, local_file

ends up putting the whole file into memory, which is fine for small files, but not the large one that I was trying to download. Luckily it wasn’t too bad to refactor my class. Here’s the new code to achieve the same effect as the above sftp.get_file command.


        stat = sftp.stat( log_file )
        offset = 0
        file_length = stat.size
        length = 64 * 1024 * 1024
        File.open(local_file, File::CREAT|File::TRUNC|File::RDWR, 0644) do |f|
          while (offset < file_length)
            sftp.open_handle(log_file) do |handle|
              data = sftp.read(handle, :length => length, :offset => offset)
              f.write(data)
              offset += data.length
            end
          end
        end

This downloads the file in 64MB increments, using only that much memory at any time.

Switching from Pound to Nginx

Posted by Matt Parrish Tue, 05 Dec 2006 20:08:00 GMT

I just switched some Ruby on Rails apps I’m running from Pound to Nginx based on the results from some articles I’ve read online. The two biggest advantages of Nginx are 1. It’s raw performance, and 2. It can serve up static files, which is great for running Capistrano’s disable_web command to show a maintenance page when redeploying an application.

At work, we’re working on a standard Ruby on Rails setup and are currently investigating two options. The first is a Mongrel cluster running behind Nginx, as I described above. The second option is fronting the Mongrel cluster with Lighttpd. Since the 1.4.x series of Lighttpd is known to have some issues with it’s mod_proxy implementation, we would use Pen until a stable 1.5 version is released.

I’ll post another article once we have finished our evaluation and chosen which option we’re going to deploy at work. Stay tuned…

Rails redeploy issue resolved

Posted by Matt Parrish Sun, 06 Aug 2006 16:25:00 GMT

I just redeployed a rails site for a client of mine, Real Idaho, using Capistrano and it didn’t work properly. I was able to resolve the issue, and here are the details in case this ever happens to you…

Running

$ rake deploy

gave the following output (some content suppressed):

...
transaction: commit
  * executing task restart
  * executing "/home/*****/apps/*****/current/script/process/reaper -d 'dispatch.fcgi'" 
    servers: ["216.118.83.207"]
    [216.118.83.207] executing command
 ** [out :: 216.118.83.207] bash: /home/*****/apps/*****/current/script/process/reaper: Permission denied
    command finished
rake aborted!
command "/home/******/apps/******/current/script/process/reaper -d 'dispatch.fcgi'" failed on 216.***.***.***

The problem is that the files in script/ and the dispatch.* files in public need to be executable, but when Capistrano pulls down the latest subversion code, it defaults those files to the permissions:

-rw-r--r--

which is just read/write.

In order to have subversion pull down the files with the correct, executable, permissions, you must run the following command on each executable file and commit them to subversion:

$ svn propset svn:executable

I found the following article, Subversion Primer for Rails projects, which has a nice script to automate the task of setting the necessary files as executable. Here’s the script:

$ svn propset svn:executable "*" `find script -type f | grep -v '.svn'` public/dispatch.*

Then, run

$ svn commit

to commit those changes to Subversion.

I think this should resolve the issue, but I haven’t redeployed yet, so I haven’t verified that this will really fix my deployment issue.