[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Escaping things in element names
This is a point worthy of contemplation.
Although the given email header is valid, I've never seen such headers used
for real. So I think the question needs to be asked: is the goal here to
represent every conceivable email message, or to provide a framework for
representing the 99.9...% of actual emails?
My intention is to use this format in conjunction with a header registry
proposal [1], which I understand is pretty much on course to become
BCP. In that proposal, we recommend:
[[
A registered field name SHOULD conform at least to the syntax defined
by RFC 2822 [4], section 3.6.8.
Further, the "." character is reserved to indicate a naming sub-
structure and MUST NOT be included in any registered field name.
Currently, no specific sub-structure is defined; if used, any such
structure MUST be defined by a standards track RFC document.
Header field names may sometimes be used in URIs, URNs and/or XML.
To comply with the syntactic constraints of these forms, it is
recommended that characters in a registered field name are restricted
to those that can be used without escaping in a URI [26] or URN [17],
and that are also legal in XML [40] element names.
Thus, for maximum flexibility, header field names SHOULD further be
restricted to just letters, digits, hyphen ('-') and underscore ('_')
characters, with the first character being a letter or underscore.
]]
This does not of itself prevent use of weird header field names, but it
indicates an expectation that mainstream header fields will be very much
more constrained than the full range permitted by RFC 2822. I would hope
that when this becomes BCP, the use of strange header field names will be
further discouraged.
For myself, I would be content if this format does not accommodate header
fields that use unusual characters. This is a simple case that deals with
most actual usage.
But if others think it is important to accommodate other header field
names, then I can think of some possible approaches:
(I'll use this example "Hello!;_world: ...")
(1) use _ as an escape character in the corresponding element name, such
that _hh indicates a field name character by its hex code, and __ indicates
a single _.
e.g. <ns:Hello_21_3B__world> ... </ns:Hello_21_3B__world>
(2) have a special header field name that specifies the actual name as an
attribute:
e.g. <ns:special name="Hello!;_world"> ... </ns:special>
This begs a question about how to deal with different namespaces. Maybe:
e.g. <xmlmsg:special ns="http://mynamespace/" name="Hello!;_world">
... </ns:special>
(3) I contemplated using a different namespace with implicit encoding. Messy.
Any more ideas? Do we really care?
#g
--
[1] http://www.ietf.org/internet-drafts/draft-klyne-msghdr-registry-06.txt
At 03:47 PM 2/12/03 +0000, Chris Croome wrote:
Hi
One the the guys I work with has pointed out that this is a valid
email header:
X-_///<&;<<-\\\%: Hello World
I can see how to do this in XHTML for HTTP headers:
<meta
http-equiv="X-_///<&;<<-\\\%"
content="Hello World"
/>
I don't know how it would look as an element rather than an
attribute.
Chris
--
Chris Croome <chris@xxxxxxxxxxxxxxxxxxx>
web design http://www.webarchitects.co.uk/
web content management http://mkdoc.com/
everything else http://chris.croome.net/
Return-Path: <bruno@xxxxxxxxxxxxxxxxxxx>
Delivered-To: chris@xxxxxxxxxxxxxxxxxxx
Received: from celery.webarchitects.co.uk (mkdoc.demon.co.uk [62.49.20.1])
by mail.webarchitects.co.uk (Postfix) with ESMTP id 611EA2B125
for <chris@xxxxxxxxxxxxxxxxxxx>; Wed, 12 Feb 2003 15:19:56 +0000
(GMT)
Received: by celery.webarchitects.co.uk (Postfix, from userid 500)
id A653A396; Wed, 12 Feb 2003 15:19:55 +0000 (GMT)
Date: Wed, 12 Feb 2003 15:19:55 +0000
From: Bruno Postle <bruno@xxxxxxxxxxxxxxxxxxx>
To: Chris Croome <chris@xxxxxxxxxxxxxxxxxxx>
Subject: Re: Fwd: [chris@xxxxxxxxxxxxxxxxxxx: Re: XML message format draft]
Message-ID: <20030212151955.GQ19991@xxxxxxxxxxxxxxxxxxx>
References: <20030210130958.GA14335@xxxxxxxxxxxxxxxxxxx>
<20030211110202.GJ19991@xxxxxxxxxxxxxxxxxxx>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20030211110202.GJ19991@xxxxxxxxxxxxxxxxxxx>
X-_///<&;<<-\\\%: Hello World
X-Face: +?je=\K3V2%NTRI}N}l]g2gi?$L.3k,XskrYQ\7
Organisation: http://www.webarchitects.co.uk/
User-Agent: Mutt/1.5.3i
On Tue 11-Feb-2003 at 11:02:02AM +0000, Bruno Postle wrote:
>
> This is the rfc: http://www.faqs.org/rfcs/rfc2822.html
>
> 2.2. Header Fields
>
> Header fields are lines composed of a field name, followed by a colon
> (":"), followed by a field body, and terminated by CRLF. A field
> name MUST be composed of printable US-ASCII characters (i.e.,
> characters that have values between 33 and 126, inclusive), except
> colon. A field body may be composed of any US-ASCII characters,
> except for CR and LF. However, a field body may contain CRLF when
> used in header "folding" and "unfolding" as described in section
> 2.2.3. All field bodies MUST conform to the syntax described in
> sections 3 and 4 of this standard.
That means that both of these are perfectly valid email headers:
Subject: Hello World
X-_///<&;<<-\\\%: Hello World
--
Bruno
-------------------
Graham Klyne
<GK@xxxxxxxxxxxxxx>