[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Implementing encoded-character




> On Wed, 2007-04-04 at 14:25 +0200, Michael Haardt wrote:
> > > > > yes.  whitespace is only allowed between hex-pairs.  btw, how do you
> > > > > feel about allowing CRLF as well as SPC and TAB between hex-pairs?
> >
> > > > Is CRLF allowed inside other ${} expressions (variables)?
> >
> > > variables doesn't allow any whitespace at all.
> >
> > Hmm, right, variables contain no arguments and we don't have functions
> > yet.  Thinking about string expressions, I certainly would like to
> > have CRLF as white space, but I also would like embedded comments in
> > that case.  Just looking at encoded-character, I see no need for CRLF
> > and even have an odd feeling with, but considering it as syntactic
> > prototype for string expressions, both CRLF and comments sound useful.

> I kind of like the idea of things that look like variables but are
> functions operating on the right side of the colon.

> We had a bit of discussion in Prague about list expansions that access
> external data sources. This would certainly be one way to handle it,
> though we'd have to be careful about strict vs. lazy evaluation. Anyhow,
> that should probably be the subject of a separate thread.

> > > > Before looking at it, I expected that if ${hex: is found, it would be
> > > > an error if it were not followed by arguments and a closing brace.
> > >
> > > well, you don't need to backtrack much:
> >
> > It's no problem really, just confusing.  If someone starts to write
> > ${hex:, most likely he meant to encode data.  Only CS people think stuff
> > like "it's not a word of the grammar, thus of course being an literal
> > as specified". ;-)

> I agree, it'd be confusing for that to happen.

> [snip]

> > > > "${unicode:200000}" -> error
> > > > "${unicode:2000000}" -> "${unicode:2000000}"

> Ugh, if it looks like encoded-char and walks like encoded-char...

> My test implementation left-shifts the current value of the encoded
> character, then adds the next hex digit. When it hits whitespace, it
> checks if the value is within appropriate bounds; if so, stores the
> character then loops, if not, stores '?' then loops. Would we really
> rather be very strict about this? I'm in favor of some flexibility.

You need to strictly implement the grammar in the specificaiton, whatever
that ends up being. Any flexibility will allow someone to write one of
these things that works in your implementation but silently fails and causes
wierd results elsewhere.

Past experience with RFC 2047 encoded-words has shown that allowing leeway in
this situations is a curse, not a blessing.

					ned