[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Implementing encoded-character



On Wed, 2007-04-04 at 11:23 +0200, Michael Haardt wrote:
> > yes.  whitespace is only allowed between hex-pairs.  btw, how do you
> > feel about allowing CRLF as well as SPC and TAB between hex-pairs?
> 
> Is CRLF allowed inside other ${} expressions (variables)?

variables doesn't allow any whitespace at all.

> > I don't understand this statement.
> 
> The grammar matches words inside the character sequence that makes up a
> string.  No matter how much of a word is matched, if it is not complete,
> it will be taken as the literal character sequence.  That means you
> need infinite look-ahead.
> 
> Before looking at it, I expected that if ${hex: is found, it would be
> an error if it were not followed by arguments and a closing brace.

well, you don't need to backtrack much:

  ${unicode:cafe ab ab ab ab ab ab
     ab ab add ${hex:40 41}}

you just go along, and as soon as you find a syntax "error", you bail
and copy what you've buffered so far verbatim (in this case,
"${unicode: ... add "), then restart the state machine.  worst case, the
buffering is the size of the script plus storage for the decoded Unicode
characters while parsing the script.

> So we have:
> 
> "${unicode:200000}" -> error
> "${unicode:2000000}" -> "${unicode:2000000}"
> 
> I don't particularly like that, because most likely the second was
> never meant that way.  Is there any way to change that at this point?

you want to change unicode-hex to 1*HEXDIG instead?  the wording should
already handle it, so it's just the ABNF which needs a tweak.  that's
fine with me.  I think it's Philip's call.

-- 
Kjetil T.