[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Slug header encoding




On Apr 13, 2007, at 11:00 AM, John Panzer wrote:
Whether or not RFC 2616 is less than clear on this point (I personally find it clear as mud), there's no ambiguity about what happens when you send an "X-Foo: eà" header to Apache running mod_jk sending data to a Tomcat servlet container: It passes the data correctly if you use the ISO-8859-1 encoding, and it corrupts the data if you use a UTF-8 encoding. At least in our tests. (Note that this happens before the data leaves the Apache process, so there's not even an opportunity to fix this at the servlet container level.)

Hmm, interesting... eà in UTF-8 would be %C3%A0, so the problem could
be either that something is counting "characters" instead of bytes
(causing the string to be truncated) or is removing spaces using an
algorithm that is only 7bit-clean (0xA0 & 0x7F = 0x20 or space).
That is, assuming we accept the premise that UTF-8 is valid within
HTTP header fields, which is false, but we generally try not to lose
data in Apache regardless of the standard (for robustness).

Have you tested it with a different two-byte UTF-8 character?

....Roy