The penalty is 3 bytes or 50% (when compared to UTF-16, but 200%
when compared to legacy encodings) for scripts such as Thai, Georgian,
Devanagari,..., and 4 bytes or 0% (when compared to UTF-16; potentially
300% when compared to imaginary legacy encodings) for scripts such as
Old Italic, Deseret, and very rare ideographs.
Assuming that in an IETF-defined protocol, the element and attribute
names and quite a bit of the attribute values are ASCII, my expectation
is that the average 'XML Protocol' will easily have an ASCII content
of around or above 50% even if it's e.g. purely Chinese. Because the
penalty for ASCII is 100% when moving from UTF-8 to UTF-16, there is
nothing much to be gained from using UTF-16 in such cases.