From: Henry Spencer (henry@spsystems.net)
Date: Tue Jul 02 2002 - 21:56:00 CDT
On Tue, 2 Jul 2002, Charles Lindsey wrote:
> >Or were they related to bad handling of UTF-8 "overlong" sequences ? This
> >is a known security hole, and is why all software handling UTF-8 *MUST*
> >detect overlong sequences.
>
> Please could you explain in more detail the nature of this particular
> security hole?
It's not really a "hole", except in the hands of incompetent programmers.
The issue is that because there is more than one way to represent a given
character in UTF-8 -- for example, you could represent "/" (U+002f) as
0xc0,0xaf -- it is problematic to recognize dangerous metacharacters when
they are encoded in UTF-8. (Newer definitions of UTF-8 often forbid such
"overlong" sequences, older ones usually didn't.)
The obvious fix for this is to decode the UTF-8 first, and do your danger
recognition on 32-bit characters. That's hardly difficult.
The *best* fix for this is to have shells (etc.) with "sandbox" modes, so
the dangerous parts of the functionality can be disabled while dealing
with potentially-suspect input. This is *immensely* superior to trying to
duplicate the parsing of something like a shell, usually in a half-baked
and buggy way, in a separate filtering subsystem. Unfortunately, support
for this is not common.
So, of course, the fix that everyone is enthused about is neither the
obvious one nor the best one: forbid overlong sequences!
I can't say I'm exactly against this -- I know of only one (sometimes)
legitimate use of overlong sequences, and that can be handled as a special
case (using 0xc0,0x80 to represent U+0000, to put it in strings without
confusing things that use 0x00 as a string terminator) -- but it seems
like putting the cart before the horse. The need for it is indicative of
failings elsewhere, which really ought to be dealt with rather than
papered over.
Henry Spencer
henry@spsystems.net