[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Escaping things in element names




This is a point worthy of contemplation.


Although the given email header is valid, I've never seen such headers used for real. So I think the question needs to be asked: is the goal here to represent every conceivable email message, or to provide a framework for representing the 99.9...% of actual emails?

My intention is to use this format in conjunction with a header registry proposal [1], which I understand is pretty much on course to become BCP. In that proposal, we recommend:
[[
A registered field name SHOULD conform at least to the syntax defined
by RFC 2822 [4], section 3.6.8.


   Further, the "." character is reserved to indicate a naming sub-
   structure and MUST NOT be included in any registered field name.
   Currently, no specific sub-structure is defined; if used, any such
   structure MUST be defined by a standards track RFC document.

   Header field names may sometimes be used in URIs, URNs and/or XML.
   To comply with the syntactic constraints of these forms, it is
   recommended that characters in a registered field name are restricted
   to those that can be used without escaping in a URI [26] or URN [17],
   and that are also legal in XML [40] element names.

   Thus, for maximum flexibility, header field names SHOULD further be
   restricted to just letters, digits, hyphen ('-') and underscore ('_')
   characters, with the first character being a letter or underscore.
]]

This does not of itself prevent use of weird header field names, but it indicates an expectation that mainstream header fields will be very much more constrained than the full range permitted by RFC 2822. I would hope that when this becomes BCP, the use of strange header field names will be further discouraged.

For myself, I would be content if this format does not accommodate header fields that use unusual characters. This is a simple case that deals with most actual usage.

But if others think it is important to accommodate other header field names, then I can think of some possible approaches:

(I'll use this example "Hello!;_world: ...")

(1) use _ as an escape character in the corresponding element name, such that _hh indicates a field name character by its hex code, and __ indicates a single _.

e.g. <ns:Hello_21_3B__world> ... </ns:Hello_21_3B__world>

(2) have a special header field name that specifies the actual name as an attribute:

e.g. <ns:special name="Hello!;_world"> ... </ns:special>

This begs a question about how to deal with different namespaces. Maybe:

e.g. <xmlmsg:special ns="http://mynamespace/"; name="Hello!;_world"> ... </ns:special>

(3) I contemplated using a different namespace with implicit encoding. Messy.

Any more ideas? Do we really care?

#g
--


[1] http://www.ietf.org/internet-drafts/draft-klyne-msghdr-registry-06.txt





At 03:47 PM 2/12/03 +0000, Chris Croome wrote:
Hi

One the the guys I work with has pointed out that this is a valid
email header:

X-_///<&;<<-\\\%: Hello World

I can see how to do this in XHTML for HTTP headers:

  <meta
    http-equiv="X-_///&lt;&amp;;&lt;&lt;-\\\%"
    content="Hello World"
  />

I don't know how it would look as an element rather than an
attribute.

Chris

--
Chris Croome <chris@xxxxxxxxxxxxxxxxxxx>
web design http://www.webarchitects.co.uk/
web content management http://mkdoc.com/
everything else http://chris.croome.net/
Return-Path: <bruno@xxxxxxxxxxxxxxxxxxx>
Delivered-To: chris@xxxxxxxxxxxxxxxxxxx
Received: from celery.webarchitects.co.uk (mkdoc.demon.co.uk [62.49.20.1])
by mail.webarchitects.co.uk (Postfix) with ESMTP id 611EA2B125
for <chris@xxxxxxxxxxxxxxxxxxx>; Wed, 12 Feb 2003 15:19:56 +0000 (GMT)
Received: by celery.webarchitects.co.uk (Postfix, from userid 500)
id A653A396; Wed, 12 Feb 2003 15:19:55 +0000 (GMT)
Date: Wed, 12 Feb 2003 15:19:55 +0000
From: Bruno Postle <bruno@xxxxxxxxxxxxxxxxxxx>
To: Chris Croome <chris@xxxxxxxxxxxxxxxxxxx>
Subject: Re: Fwd: [chris@xxxxxxxxxxxxxxxxxxx: Re: XML message format draft]
Message-ID: <20030212151955.GQ19991@xxxxxxxxxxxxxxxxxxx>
References: <20030210130958.GA14335@xxxxxxxxxxxxxxxxxxx> <20030211110202.GJ19991@xxxxxxxxxxxxxxxxxxx>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20030211110202.GJ19991@xxxxxxxxxxxxxxxxxxx>
X-_///<&;<<-\\\%: Hello World
X-Face: +?je=\K3V2%NTRI}N}l]g2gi?$L.3k,XskrYQ\7
Organisation: http://www.webarchitects.co.uk/
User-Agent: Mutt/1.5.3i


On Tue 11-Feb-2003 at 11:02:02AM +0000, Bruno Postle wrote:
>
> This is the rfc: http://www.faqs.org/rfcs/rfc2822.html
>
> 2.2. Header Fields
>
>    Header fields are lines composed of a field name, followed by a colon
>    (":"), followed by a field body, and terminated by CRLF.  A field
>    name MUST be composed of printable US-ASCII characters (i.e.,
>    characters that have values between 33 and 126, inclusive), except
>    colon.  A field body may be composed of any US-ASCII characters,
>    except for CR and LF.  However, a field body may contain CRLF when
>    used in header "folding" and  "unfolding" as described in section
>    2.2.3.  All field bodies MUST conform to the syntax described in
>    sections 3 and 4 of this standard.

That means that both of these are perfectly valid email headers:

Subject: Hello World

X-_///<&;<<-\\\%: Hello World

--
Bruno

------------------- Graham Klyne <GK@xxxxxxxxxxxxxx>