"Transfer-Encoding: chunked", or, Chunky HTPP!

Providing a web service for a bunch of browsers is a relatively straightforward affair. It’s really only once you jump out of the back-end and into the front-end side of things where issues like browser incompatibilities start to become a problem. Thankfully, I feel like I’m in the position where I think I’ve got my head wrapped around what’s involved in providing solutions to these problems.

And then mobile phones came along.

The service I’m working on at the moment is consumed by a bunch of clients, including but not limited to web browsers, WAP browsers, the Flash player, iPhones, J2ME devices. It’s the last one that’s causing headaches at the moment.

You see, despite the fact that HTTP/1.1 is about 9 years old, not all web servers support the features that were introduced. The particular one I’m talking about is chunked tranfer encoding, but I’m sure there are many others.

To give you a general idea, the HTTP implementations on many mobile handsets will decide to use a chunked transfer encoding if the payload of a PUT/POST request is over an arbitrary threshold. This causes issues with servers like Nginx, Lighttpd, Thin, since most of those assume that an incoming HTTP request with a payload will also include a Content-Length header.

Well guess what? As of 9 years ago, that hasn’t been the case.

Take a look at this request:

HTTP request specifying “chunked” transfer encoding
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
POST /search.json HTTP/1.1
User-Agent: curl/7.16.4 (i486-pc-linux-gnu) libcurl/7.16.4 OpenSSL/0.9.8e zlib/1.2.3.3 libidn/1.0
Host: ooboontoo
Accept: */*
Transfer-Encoding: chunked
Content-Type: multipart/form-data; boundary=----------------------------ab5090ac7869


92
------------------------------ab5090ac7869
Content-Disposition: form-data; name="query"

zoooom
------------------------------ab5090ac7869--


0

Notice anything? If not, try and find a Content-Length header. Pre-HTTP/1.1 you’d expect to get an HTTP 411 error (length required), but after HTTP/1.1 it’s pretty clear what the HTTP/1.1 applications are obliged to do:

All HTTP/1.1 applications that receive entities MUST accept the “chunked” transfer-coding (section 3.6), thus allowing this mechanism to be used for messages when the message length cannot be determined in advance.

That’s why I find it so surprising that hacks are involved in allowing mobile clients to POST/PUT to what I’d traditionally thought of as HTTP/1.1 compliant web servers.

But anyway, you want to see the solution right?

Well, our initial solution involved Gerald Kaszuba writing a little web server in Python which went by the name of “Dechunker”. You can imagine what it did, but we quickly found that while it was the simplest way to avoid the problem, it also meant that over time we would end up needing to implement the functionality that was already available in most other web servers. Servers like Apache2 and Lighttpd have become incredibly hardened over the years, and we’re not going to achieve that overnight.

So I then took another look at Apache2, knowing that some modules do support chunked transfer encoding while others don’t. What I discovered was that Apache’s mod_proxy module could be used in front of anything that doesn’t support chunked encoding, since it can be configured to “dechunk” requests before passing them to a backend.

It looks a little something like this:

Apache configuration to reconstitute “chunked” HTTP requests
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
ProxyRequests Off

<Proxy http://localhost:81>
  Order deny,allow
  Allow from all
</Proxy>

<VirtualHost *:80>
  SetEnv proxy-sendcl 1
  ProxyPass / http://localhost:81/
  ProxyPassReverse / http://localhost:81/
  ProxyPreserveHost On
  ProxyVia Full

  <Directory proxy:*>
    Order deny,allow
    Allow from all
  </Directory>

</VirtualHost>

Listen 81

<VirtualHost *:81>
  ServerName ooboontoo
  DocumentRoot /path/to/my/rails/root/public
  RailsEnv development
</VirtualHost>

As you can see I’ve got Apache2 listening on port 80, which uses the proxy-sendcl environment variable available in mod_proxy to repack the HTTP body and add a Content-Length header to the request. This request is then passed back to a virtual host running on port 81, which is configured to use Phusion Passenger.

Turns out it’s pretty simple, and from what I’ve seen there haven’t been any negative performance impacts by proxying all requests. It’s not a permanent solution, and as soon as Phusion Passenger fixes the chunked encoding bug, I’ll drop mod_proxy from our configuration.