[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Some text that may be useful for the update of RFC 2376




"Martin J. Duerst" wrote:
> 
> I came up with this for a different purpose, but Dan Connolly
> suggested it might be added to an update of RFC 2376, as a
> quick overview:

Here is my suggested amendment, which removes dubious wiggle room and
weasel words and makes the result purely deterministic:

> - XML sent (e.g. mail, http) as text/xml (or equivalent, e.g. text/vnd.wap.wml):

as text/"anything" in other words

>   - Charset parameter is strongly recommended

Charset parameter is required if the charset is not UTF-8 or UTF-16

>   - If no charset parameter, default is ASCII. The default of iso-8859-1 in
>     HTTP is explicitly overridden in the specification of the charset
>     parameter in section 3.1 "Text/xml Registration" of RFC 2376
>     (http://www.ietf.org/rfc/rfc2376.txt)

The charset (not default, but THE charset) is UTF-16 (if BOM) or UTF-8 (if
no BOM) and the "default" of iso-8859-1 in HTTP and US-ASCII in mail is
explicitly overridden ...

>   - No error handling provisions
>   - An encoding declaration, if present, is irrelevant, but when saving a
>     received resource as a file, the correct encoding declaration should
>     be inserted.

shall be inserted. 

[if the application claims to save as XML rather than saving as a bunch of
stuff with pointy brackets. If it fails to do so, then the rules for static
storage explains what happens when the file is next parsed - WF error. ]

> - XML sent as application/xml (or equivalent):
>   - Charset parameter is strongly recommended, and if present,
>     it takes precedence.

Charset parameter is *disallowed*.

>   - If the charset parameter is omited, the rules for XML in static storage
>     are followed (see below).

The rules for XML in static storage are followed. Such files may be freely
saved to static storage without modification in all cases.

> - XML in static storage without external metainformation (e.g. file):
>   - Default is UTF-8, or UTF-16 if there is a BOM

For files without an explicit encoding declaration, the file is in UTF-16
if there is a BOM and UTF-8 if there is not.

>   - For other things, there has to be an encoding declaration
>   - There is some provision for 'error recovery'. What exactly this
>     means is currently under discussion in the XML Core WG, so that
>     it can  be clarified.

"Some provision"????

There is no provision for error recovery, and if a file does not parse for
whatever reason then it shall be a well formedness error.

--
Chris