[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: UTF-8 iCalendar bug solution



On Thu, 27 Feb 2003 09:30:35 -0500, Mark Swanson wrote
> On February 27, 2003 07:50 am, Mike Higginbottom wrote:
> >
> > P.S. As an aside, I think I missed a step in the discussion of why we want
> > to extend NON-US-ASCII to 0x80-0xFF.  UTF-8 specifically precludes use of
> > 0xFE and 0xFF so on that basis we should extend only to 0x80-0xFD.  Are we
> > specifying 0xFF to cover the Latin-1 charset that appears to be in common
> > usage?
> 
> I like it because it covers the Latin-1 charset, which appears to be 
> in common use.
> 
> I just though of another reason why the definition of NON-US-ASCII 
> is broken: if the MIME charset, or a CHARSET property/parameter, or 
> even making use of Section 2.3 to use whatever charset/codepoints 
> you like requires 0xf9-0xff then you are forced to create illegal 
> iCalendar and will not be able to interoperate.

If you think of the various productions that specify codepoints as abstract,
then they can be thought of as giving examples in the default (UTF-8)
character set, and as not binding on iCalendar objects in other character
sets.  However, I find little or no support in 2445 for such an
interpretation, which means NON-US-ASCII (and the exclusion of the C0 control
set (\0x00-\0x1F) plus DEL) could use some clarifying text.

It is my impression that the authors of 2445 found this as complex and
annoying an issue as we do, and decided to put it off for later.  My evidence
for this is in sec. 4.3.11 (quoted in part below); it appears that iCalendar
2.0 only supports latin-1 where it overlaps with 7-bit ASCII.

    /cco

4.3.11 Text

   Value Name: TEXT

   Purpose This value type is used to identify values that contain human
   readable text.

   Formal Definition: The character sets supported by this revision of
   iCalendar are UTF-8 and US ASCII thereof. The applicability to other
   character sets is for future work. The value type is defined by the
   following notation.

--
GPG Key Fingerprint: B375 A4E7 752B DB8C 4359  852E C3CF BF64 379A E9B2
Debian Project (http://www.debian.org)