[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Slug header encoding



Bjoern Hoehrmann wrote:
* John Panzer wrote:
  
I don't think you're saying this, but it sounds like you're saying that 
you can simply write the UTF-8 byte sequences in the header.  For the 
record: The problem here is that HTTP defines header fields to be 
Latin-1.  Coincidentally, I am currently engaged in debugging a problem 
in which someone is sending UTF-8 encoded bytes via an HTTP header, 
which then get corrupted, somewhere inside either Apache or mod_jk.  
    

I would argue RFC 2616 is less than clear in this regard and as far as I
can tell there is little consensus among deployed servers and agents how
to interpret this. I would certainly hope a future version of RFC 2616
requires servers to use UTF-8 for the protocol-defined text parts of the
messages and clients to assume a different encoding only if it is not
UTF-8 encoded to accomodate legacy applications as necessary.
  
Whether or not RFC  2616 is less than clear on this point (I personally find it clear as mud), there's no ambiguity about what happens when you send an "X-Foo: eà" header to Apache running mod_jk sending data to a Tomcat servlet container: It passes the data correctly if you use the ISO-8859-1 encoding, and it corrupts the data if you use a UTF-8 encoding.  At least in our tests.  (Note that this happens before the data leaves the Apache process, so there's not even an opportunity to fix this at the servlet container level.)

--
AbstractioneerJohn Panzer
System Architect
http://abstractioneer.org