[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Implementing encoded-character
On Wed, 2007-04-04 at 11:23 +0200, Michael Haardt wrote:
> > yes. whitespace is only allowed between hex-pairs. btw, how do you
> > feel about allowing CRLF as well as SPC and TAB between hex-pairs?
>
> Is CRLF allowed inside other ${} expressions (variables)?
variables doesn't allow any whitespace at all.
> > I don't understand this statement.
>
> The grammar matches words inside the character sequence that makes up a
> string. No matter how much of a word is matched, if it is not complete,
> it will be taken as the literal character sequence. That means you
> need infinite look-ahead.
>
> Before looking at it, I expected that if ${hex: is found, it would be
> an error if it were not followed by arguments and a closing brace.
well, you don't need to backtrack much:
${unicode:cafe ab ab ab ab ab ab
ab ab add ${hex:40 41}}
you just go along, and as soon as you find a syntax "error", you bail
and copy what you've buffered so far verbatim (in this case,
"${unicode: ... add "), then restart the state machine. worst case, the
buffering is the size of the script plus storage for the decoded Unicode
characters while parsing the script.
> So we have:
>
> "${unicode:200000}" -> error
> "${unicode:2000000}" -> "${unicode:2000000}"
>
> I don't particularly like that, because most likely the second was
> never meant that way. Is there any way to change that at this point?
you want to change unicode-hex to 1*HEXDIG instead? the wording should
already handle it, so it's just the ABNF which needs a tweak. that's
fine with me. I think it's Philip's call.
--
Kjetil T.