2) Escape every character outside US-ASCII as character reference.
This is one choice. The other choice is to convert to UTF-8. The advantages are: - People can still read the source text - Overall shorter - Faster