[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Implementing encoded-character
> > "${hex:40" -> "${hex:40"
> > "${hex: 40 }" -> "${hex: 40 }"
>
> yes. whitespace is only allowed between hex-pairs. btw, how do you
> feel about allowing CRLF as well as SPC and TAB between hex-pairs?
Is CRLF allowed inside other ${} expressions (variables)?
> > "${unicode:40}" -> "${unicode:40}"
>
> no, this is "@".
Good thing I asked. I just reread RFC 2234 and found out that I
have to read 1*6HEXDIG as 1*6(HEXDIG), not as 1*(6HEXDIG).
> > There is no word of the encoded-character grammar inside the string,
> > taking everything literal.
> I don't understand this statement.
The grammar matches words inside the character sequence that makes up a
string. No matter how much of a word is matched, if it is not complete,
it will be taken as the literal character sequence. That means you
need infinite look-ahead.
Before looking at it, I expected that if ${hex: is found, it would be
an error if it were not followed by arguments and a closing brace.
> > "${hex:40${hex:40}}" -> "${hex:40$}
>
> no, "${hex:40@}"
Oops, I meant to write @. But you agree on my interpretation how things
are processed.
> > "${unicode:020000}" -> error
> >
> > Unicode range violation.
>
> no, U+20000 is inside the Unicode range. ${unicode:0020000} fails due
> to not matching unicode-hex (too many digits), ${unicode:200000} fails
> due to being outside the Unicode range.
My mistake, again. So we have:
"${unicode:200000}" -> error
"${unicode:2000000}" -> "${unicode:2000000}"
I don't particularly like that, because most likely the second was
never meant that way. Is there any way to change that at this point?
Michael