[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Fixing RFC 1641
At 11:32 AM 11/30/94, Valdis.Kletnieks@vt.edu wrote:
>> I would like to fix this problem so that there will be a means of
>> transmitting Unicode directly, not encoded with UTF-7 or UTF-8, both of
>> which impose some overhead. Clearly this would not be for interoperability
>> with non-Unicode or non-MIME sites, but it would be convenient for
>> communication between sites using Unicode.
>Is there anything Unicode-1-1 needs different from text/plain besides the
>removal of the CRLF restriction? And can you specify the exact nature of
>the problem for those of us who aren't Unicode-literate?
The problem is that the CRLF restriction is binary, and requires the octet
sequence 0D 0A be used for line breaks and only for line breaks. Since
Unicode/10646 is a 16 bit character set, this octet sequence does not mean
line break and can in fact occur as parts of other characters, e.g.:
0D0A;MALAYALAM LETTER UU
090D;DEVANAGARI LETTER CANDRA E
0A20;GURMUKHI LETTER TTHA
The CRLF sequence would be represented in Unicode as 000D 000A, although
many other line break conventions are possible, including use of:
>The problem is that the CRLF convention is dependent on the RFC821 spec
>of CRLF and 1000-char lines, so I dont see an easy way of removing it
>until/unless you either (a) accept some sort of CTE or (b) that you can only
>do it over a connection that uses an SMTP extension to negotiate a binary
RFC 1641 recommended that unicode-1-1 only be used with binary-safe cte's
such as binary or base64. This is fine for unicode-1-1 as it's not readable
by a recipient without both MIME and Unicode support anyway. However, the
new draft MIME spec doesn't allow that kind of out. It says all subtypes of
text must use CRLF conventions, period, regardless of cte. My understanding
from discussions on this list is that this is a necessary fact of life for
compatibility with existing software. I would have been happy with a cte
solution, but the consensus seems to be it has to work the way the new spec
>Would it be acceptable to use unicode-1-1 with some sort of byte-stuffing
>hack rather than the full utf-7 or utf-8, similar to the way rfc821
>specifies doubling a '.' that is by itself on a line?
Well, yes, but that would be yet another transformation format, something
I'd rather avoid.
This is a long term issue, because right now there are few enough binary
transports that Unicode would always get sent as base64 anyway, in which
case you might as well send it as UTF-7. I would like to start sounding out
a solution, however.
10201 N. DeAnza Blvd.
Cupertino, CA 95014-2233