Desmond Brand

Pressing buttons in the right order

Using nginx as a reverse proxy for speedy App Engine development

Context: I recently posted about using Apache as a reverse-proxy for Google App Engine development. See that post for the motivation for monkeying with Real Web Servers when what you really want to be doing is developing your GAE app. The quick version is that you can reduce your local pageload time by a factor of 20 or more, depending how many static assets (images, javascript, stylesheets) your app has.

Proxying with Apache was a big improvement using the single-threaded, I/O blocking dev_appserver.py directly, but it was by no means perfect. I noticed that if I hadn’t hit Apache for a while, it’d take roughly 10 seconds before my next request would be served. If you know what you’re doing, it’s probably possible to configure Apache not to do this. But I don’t know what I’m doing, and Apache configuration isn’t exactly friendly.

Instead, I switched to using nginx, which is sometimes described as a reverse proxy first and webserver second. It’s insanely fast at serving static files, uses little memory, and configuring it turned out to be easy. Also, who can resist that logo?

How to set up nginx as a reverse proxy

First, install nginx. With homebrew on OS X this is just brew install nginx.

Next, we need to configure nginx to work as a reverse proxy. The following configuration did the trick for me. I put this file at /usr/local/etc/nginx/kadev.conf, as the include refers to a file mime.types in that directory.

Notice how DRY this config is compared to the Apache equivalent. It’s really easy to add extra server sections if you have multiple development servers.

/usr/local/etc/nginx/kadev.conf
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
worker_processes        1;

events {
    worker_connections  1024;
}

http {
    include             mime.types;
    default_type        application/octet-stream;
    sendfile            on;
    keepalive_timeout   65;

    server {
        listen          khanacademy.dev:80;
        server_name     khanacademy.dev;
        root            /Users/dmnd/Projects/khan/src/stable;

        location / {
            try_files   $uri   @proxy;
        }

        location @proxy {
            proxy_pass   http://127.0.0.1:8080;
        }
    }
}

Line 19 is where the magic happens. It tells nginx to check first for a file matching the path for a request. If a file exists, nginx serves it directly. Otherwise, nginx forwards the request to the proxied server, in this case creaky old dev_appserver.py.

Finally, run nginx with sudo nginx -c ~/path/to/config/file.conf. sudo is needed to listen on port 80. If you’re running on another port, you don’t need it.

Now you should be able use your proxy to serve your static files quickly so you can get on with development.

Debugging tips

If something doesn’t work, here are some things to try.

  • Check that your configuration is valid with nginx -c ~/path/to/config/file.conf -t.

  • The above configuration tells nginx to listen on port 80, so if you have Apache running you will need to disable it first. On OS X, open System Preferences, select Sharing, then uncheck “Web Sharing”. Or you can tell it to exit with sudo apachectl -k stop.

  • If you installed with brew install nginx, logs will be stored at /usr/local/Cellar/nginx/1.0.7/logs by default. If something isn’t working, tail -f the files in that directory.

If you needed to do anything else to get it running, please leave a comment to help the next person.

Speed up your App Engine dev server with an Apache reverse proxy

When using the Google App Engine SDK in development mode, you have probably noticed that dev_appserver.py is incredibly slow. This is because all requests – even requests for static files like javascript, stylesheets or images – are handled sequentially by a single thread. Take a look at the timeline below. Does the left side look familiar?

At Khan Academy, a single pageload in development mode might end up requesting as many as 100 different resources. This takes dev_appserver.py more than 35 seconds to serve, which sucks. Coupled with Webkit’s annoyingly aggressive cache, this can really kill your productivity.

It’s possible to get a big speed boost by setting up Apache as a reverse proxy in front of your dev server. All requests for static assets will be handled by Apache, which is blazing fast compared to dev_appserver.py. The 35 second request above is fulfilled in only 2 seconds, with most of the static files loading in parallel (see the right side).

Update 2011-11-15: It turns out nginx is even quicker at serving static files, uses less memory, and easier to configure too! From here on I recommend reading this post instead, which tells you how to set up the same thing with nginx rather than Apache.

How to set this up

First, enable Virtual Hosts in Apache. Edit /etc/apache2/httpd.conf and go to line 623. Uncomment the line for vhosts, so it looks like the following.

/etc/apache2/httpd.conf
623
624
# Virtual hosts
Include /private/etc/apache2/extra/httpd-vhosts.conf

Next, open /etc/apache2/extra/httpd-vhosts.conf and insert something like the following. Fellow KA devs shouldn’t have to edit much, but if you’re working on a different app you will obviously have to change the static directories. Look in app.yaml to see the full list of statically served paths.

/etc/apache2/extra/httpd-vhosts.conf
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
<VirtualHost *:80>
    ServerName khanacademy.local

    # don't proxy these paths. Instead, serve them directly from apache
    ProxyPass /javascript !
    Alias /javascript "/Users/dmnd/Projects/khan/src/stable/javascript"

    ProxyPass /stylesheets !
    Alias /stylesheets "/Users/dmnd/Projects/khan/src/stable/stylesheets"

    ProxyPass /images !
    Alias /images "/Users/dmnd/Projects/khan/src/stable/images"

    ProxyPass /gae_bingo/static !
    Alias /gae_bingo/static "/Users/dmnd/Projects/khan/src/stable/gae_bingo/static"

    ProxyPass /gae_mini_profiler/static !
    Alias /gae_mini_profiler/static "/Users/dmnd/Projects/khan/src/stable/gae_mini_profiler/static"

    ProxyPass /khan-exercises !
    Alias /khan-exercises "/Users/dmnd/Projects/khan/src/stable/khan-exercises"

    # everything else gets proxied through to the dev server
    ProxyPass / http://localhost:8080/

    # let apache rewrite URLs in response headers
    ProxyPassReverse / http://localhost:8080/

    # give apache some permissions to the src directory so it can serve static files
    <Directory "/Users/dmnd/Projects/khan/src">
        Options Indexes FollowSymLinks Includes ExecCGI
        AllowOverride All
        Order allow,deny
        Allow from all
        AddDefaultCharset utf-8
    </Directory>
</VirtualHost>

Finally, map the ServerName you picked to localhost by editing your /etc/hosts file. See line 12 below.

/etc/hosts
1
2
3
4
5
6
7
8
9
10
11
12
##
# Host Database
#
# localhost is used to configure the loopback interface
# when the system is booting.  Do not change this entry.
##
127.0.0.1   localhost
255.255.255.255 broadcasthost
::1             localhost
fe80::1%lo0 localhost
# Easy access to app engine dev server
127.0.0.1   khanacademy.local

This allows you to access your dev server via something other than localhost, which is needed for the virtual host to work. If you don’t already have --address=0.0.0.0 as a parameter to dev_appserver.py you will need to add this.

Also, Apache needs to be enabled - the easiest way to do this is to go to Sharing under System Preferences and check the “Web Sharing” item. If you already have it enabled, you may need to clear and check it again to force a restart. If it doesn’t start, check your config syntax with apachectl -St.

This setup should work on OS X Lion. Small changes might be needed for other OSes. If you had to tweak anything, mention it in the comments.

Your own personal hgignore file

Sometimes people don’t agree on the contents of the tracked .hgignore file in the repository root. For example, I don’t like having *orig in .hgignore as having backup files show up when I grep is annoying. I solved that problem by removing the *orig pattern and telling other repository users about hg purge.

But today I found another way to deal with different opinions for ignore files. Hidden away on the Mercurial wiki is a nice tip about per-user hgignore files. In a repository’s hgrc you can reference an arbitrary file to be used in addition to the tracked .hgignore file. No more .hgignore wars!

Caffeinated keeps your PC awake

I use an excellent if simple program called Caffeine on OS X. Its only purpose is to temporarily prevent your computer from automically sleeping, or displaying the screensaver. A similar program called Insomnia is available for Windows, but I dislike its UI.

So, I built Caffeinated. It’s a port of Caffeine that runs on Windows. The UI is straight-up lifted from Caffeine, and the entire program is pretty much just a usable wrapper around the SetThreadExecutionState function from the Windows API.

Despite its simplicity, I find it useful, and maybe you will too. Read more about it, or download it now.

Django’s inclusion_tag is not cached in AppEngine

Since we starting using the massively convenient GAE Mini Profiler, we were surprised to discover that we spend a significant amount of time reading files from disk. Here’s a particulary extreme example:

This was contrary to my understanding that App Engine tries to cache almost any code-related file read. After investigating, I found that App Engine does try to cache templates – see the source of template.py. But it turns out this only works when you render a template directly with webapp.template.render, and not when you use Django’s inclusion_tag.

To verify this, I put together a basic page with some repeated template use and used opensnoop (and after discovering that tool, I need to learn more about DTrace) to observe changes to the filesystem. Here’s the result when using inclusion_tag. You can see the template simple_student_info.html was loaded over and over again:

$ sudo opensnoop -n Python
  UID    PID COMM          FD PATH
  501  27864 Python         7 .
  501  27864 Python         6 /Applications/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/google/appengine/../../VERSION
  501  27864 Python         6 /var/folders/TX/TXTcFXvEFKKsTqfua-9AGE+++TI/-Tmp-/request.7SyMKG.tmp
  501  27864 Python         6 /var/folders/TX/TXTcFXvEFKKsTqfua-9AGE+++TI/-Tmp-/request.7SyMKG.tmp
  501  27864 Python         7 /Users/dmnd/projects/khan/src/desmond/templatetest.html
  501  27864 Python         7 /Users/dmnd/projects/khan/src/desmond/simple_student_info.html
  501  27864 Python         7 /Users/dmnd/projects/khan/src/desmond/simple_student_info.html
  501  27864 Python         7 /Users/dmnd/projects/khan/src/desmond/simple_student_info.html
  501  27864 Python         7 /Users/dmnd/projects/khan/src/desmond/simple_student_info.html
  501  27864 Python         7 /Users/dmnd/projects/khan/src/desmond/simple_student_info.html

When calling webapp.template.render directly, the template is only read once:

$ sudo opensnoop -n Python
  UID    PID COMM          FD PATH
  501  27864 Python         6 /Applications/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/google/appengine/../../VERSION
  501  27864 Python         6 /var/folders/TX/TXTcFXvEFKKsTqfua-9AGE+++TI/-Tmp-/request.j-MxJs.tmp
  501  27864 Python         6 /var/folders/TX/TXTcFXvEFKKsTqfua-9AGE+++TI/-Tmp-/request.j-MxJs.tmp
  501  27864 Python         7 .
  501  27864 Python         7 /Users/dmnd/projects/khan/src/desmond/templatetest.html
  501  27864 Python         7 .
  501  27864 Python         7 /Users/dmnd/projects/khan/src/desmond/simple_student_info.html
  501  27864 Python         7 .
  501  27864 Python         7 .
  501  27864 Python         7 .
  501  27864 Python         7 .

As we’re already using inclusion_tag all over the place, I added support for AppEngine’s template caching to it replacing all usages of django.template.Library with my own subclass. You can use it by including this file in your project, and changing the top of your templatetags.py files like so:

Without caching.py
1
2
from google.appengine.ext import webapp
register = webapp.template.create_template_register()
With caching.py
1
2
import template_cached
register = template_cached.create_template_register()

Caveats: we’re still using Django 0.96, so there’s a chance this only applies to that version of Django on AppEngine.