[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Implementing encoded-character



> > Hmm, right, variables contain no arguments and we don't have functions
> > yet.  Thinking about string expressions, I certainly would like to
> > have CRLF as white space, but I also would like embedded comments in
> > that case.  Just looking at encoded-character, I see no need for CRLF
> > and even have an odd feeling with, but considering it as syntactic
> > prototype for string expressions, both CRLF and comments sound useful.
>
> I kind of like the idea of things that look like variables but are
> functions operating on the right side of the colon.
>
> We had a bit of discussion in Prague about list expansions that access
> external data sources. This would certainly be one way to handle it,
> though we'd have to be careful about strict vs. lazy evaluation. Anyhow,
> that should probably be the subject of a separate thread.

I suggest to look at Exim and the Exim filter, and their string
expressions, as a live and working example that's very similar.

> > > > "${unicode:200000}" -> error
> > > > "${unicode:2000000}" -> "${unicode:2000000}"
>
> Ugh, if it looks like encoded-char and walks like encoded-char...

That's the point.

> My test implementation left-shifts the current value of the encoded
> character, then adds the next hex digit. When it hits whitespace, it
> checks if the value is within appropriate bounds; if so, stores the
> character then loops, if not, stores '?' then loops. Would we really
> rather be very strict about this? I'm in favor of some flexibility.

Given you use C and unsigned integers, or signed ones on a common
architecture where overflows are ignored, you already have a problem.
You could stop aggregating after the 6th nibble, simply parsing more,
and then generate an overflow error if there are really more.

I suggest to make the specification a bit more flexible and have
implementations obey it strictly.

Michael