[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

X.509 implementation notes



I've been putting together a few notes on X.509 implementation issues over the 
last few days after discussion with other people revealed that there seems to 
be a lot of confusion over how these should be done.  A very rough first draft 
is included below, if anyone has any comments on them please let me know.  
I'll probably put this on the web somewhere when its finished since it seems 
there's a need for this sort of thing.
 
Just give me a minute to get a fork and some marshmellows before you reply :-).
 
Peter.

                        X.509 Implementation Notes
                        ==========================
 
                 Peter Gutmann, pgut001@cs.auckland.ac.nz
                              30 October 1996
 
Anyone who has had to work with X.509 has probably experienced what can best be
described as ISO water torture, which involves ploughing through all sorts of
ISO, ANSI, ITU, and IETF standards, amendments, meeting notes, draft standards,
committee drafts, working drafts, and other work-in-progress documents, some of
which are best understood when held upside-down in front of a mirror.  Because
of this confusion, noone seems to know how to implement X.509 certificates
properly (see 'Known Bugs' below).  This document is an attempt at providing a
cookbook for certificates which should give you everything that you can't find
anywhere else without devoting your life to the search, as well as comments on
what you'd typically expect to find in certificates.  The conventions used are:
 
- All encodings follow the DER unless otherwise noted.
 
- Most of the formats are ASN.1.  Occasionally 15 levels of indirection are cut
  out to make things easier to understand.
 
- Questions are marked by '=>' at the start.  If anyone has any comments on
  these please let me know.
 
 
Certificate
-----------
 
Certificate ::= SEQUENCE {
    tbsCertificate          TBSCertificate,
    signatureAlgorithm      AlgorithmIdentifier,
    signature               BIT STRING
    }
 
The encoding of the Certificate may follow the BER rather than the DER.  At
least one implementation uses the indefinite-length encoding form for the
SEQUENCE.
 
 
TBSCertificate
--------------
 
TBSCertificate ::= SEQUENCE {
    version          [ 0 ]  Version DEFAULT v1(0),
    serialNumber            CertificateSerialNumber,
    signature               AlgorithmIdentifier,
    issuer                  Name,
    validity                Validity,
    subject                 Name,
    subjectPublicKeyInfo    SubjectPublicKeyInfo,
    issuerUniqueID    [ 1 ] IMPLICIT UniqueIdentifier OPTIONAL,
    subjectUniqueID   [ 2 ] IMPLICIT UniqueIdentifier OPTIONAL,
    extensions        [ 3 ] Extensions OPTIONAL
    }
 
 
Version
-------
 
Version ::= INTEGER { v1(0), v2(1), v3(2) }
 
The default version is v1(0).  If the issuerUniqueID or subjectUniqueID are
present than the version must be v2(1) or v3(2).  If extensions are present
than the version must be v3(2).  An implementation should target v3
certificates, which is what everyone is moving towards.
 
 
SerialNumber
------------
 
CertificateSerialNumber ::= INTEGER
 
This should be unique for each certificate issued by a CA (typically a CA will
keep a counter in persistent store somewhere, perhaps a config file under Unix
and in the registry under Windows).  Another way is to take the current time in
seconds and subtract some base time like the first time you ran the software,
to keep the numbers manageable.
 
Serial numbers aren't necessarily restricted to 32-bit quantitues.  For example
the RSADSI Commercial Certification Authority serial number is 0x0241000016,
which is larger than 32 bits.  If you're writing certificate-handling code,
just treat them as a blob which happens to be an encoded integer.
 
 
Signature
---------
 
This rather misnamed field contains the algorithm identifier for the signature
algorithm used by the CA to sign the certificate.  There doesn't seem to be
much use for this field, although you should check that the algorithm
identifier matches the one of the signature on the cert (if anyone can forge
the signature on the cert then they can also change the inner algorithm
identifier, it's possible that this was included because of some obscure attack
where someone who could convince (broken) signature algorithm A to produce the
same signature value as (secure) algorithm B could change the outer,
unprotected algorithm identifier from B to A, but couldn't change the inner
identifier without invalidating the signature.  What this would achieve is
unclear).
 
 
Name
----
 
Name ::= SEQUENCE OF RelativeDistinguishedName
 
RelativeDistinguishedName ::= SET OF AttributeValueAssertion
 
AttributeValueAssertion ::= SEQUENCE {
    attributeType           OBJECT IDENTIFIER,
    attributeValue          ANY
    }
 
This is used to encode that wonderful ISO creation, the Distinguished Name
(DN), a path through an X.500 directory information tree (DIT) which uniquely
identifies everything on earth.  Although the RelativeDistinguishedName (RDN)
is given as a SET OF AttributeValueAssertion (AVA) each set should only contain
one element.  However it *may* contain more than one, there has been a reported
case of a certificate which contained more than one element in the SET.
 
=> How are these handled?  If you've got an entry with a CN and extra
   attributes of email address and phone number it's obvious to a human that
   the CN is the important one, but how do you handle it in software?
 
When encoding sets with cardinality > 1, you need to take care to follow the
DER rules which say that they should be ordered by their encoded values
(although ASN.1 says a SET is unordered, the DER adds ordering rules to ensure
it can be encoded in an unambiguous manner).  What you need to do is encode
each value in the set, then sort them by the encoded values, and output them
wrapped up in the SET OF encoding.
 
You don't have to use a Name for the subject name if you don't want to; there
is a subjectAltName extension which allows use of email addresses or URL's.  If
you want to do this, make the Name an empty sequence and include a
subjectAltName extension and mark it critical.  Because of this, you should be
prepared to accept a zero-length sequence for the subject name in version 3
certificates.
 
Typically you would expect to find the following types of AVA's in an X.509
certificate, starting from the top:
 
countryName     ::= SEQUENCE { { 2 5 4 6 }, StringType( SIZE( 2 ) ) }
organization    ::= SEQUENCE { { 2 5 4 10 }, StringType( SIZE( 1...64 ) ) }
organizationalUnitName
                ::= SEQUENCE { { 2 5 4 11 }, StringType( SIZE( 1...64 ) ) }
commonName      ::= SEQUENCE { { 2 5 4 3 }, StringType( SIZE( 1...64 ) ) }
 
The countryName is the ISO 3166 code for the country.  A StringType is either a
TeletexString, a PrintableString, a UniversalString, or an IA5String.
 
There appears to be some confusion about what format a Name in a certificate
should take.  In theory it should be a full, proper DN, which traces a path
through the X.500 DIT, eg:
 
  C=US/L=Area 51/O=Hanger 13/OU=X.500 Standards Designers/CN=John Doe
 
but since the DIT's usually don't exist, exactly what format the DN should take
seems open to debate.  Some implementations seem to let you stuff anything with
an OID into a DN.
 
=> Are there any guidelines for this?
 
DN's are a very awkward way to handle information about a certificate because
most people will want to associate an email address and a name with a
certificate and no more.  If you want to take the easy way out, use an empty
sequence for the subjectName, provide an email address as a subjectAltName
extension, and mark it critical.
 
 
Validity
--------
 
Validity ::= SEQUENCE {
    notBefore               UTCTIME,
    notAfter                UTCTIME
    }
 
The IETF recommends that all times be expressed in GMT and seconds not be
encoded, giving:
 
  YYMMDDHHMMZ
 
as the time encoding.  However certificates have been found which include
seconds.  This doesn't lead to an ambiguous encoding because you should never
encode a value of 00 seconds, which means if you read in a UTCTime value
generated by an implementation which doesn't use seconds and write it out again
with an implementation which does, it'll have the same encoding because the 00
won't be encoded.
 
Non-GMT encodings have never been reported, but it may be a good idea to
include handling for the "+/-xxxx" time offset format just in case (but flag it
as a decoding error).
 
In coming up with the worlds least efficient time encoding format, the ISO
nevertheless decided to forgo the encoding of centuries, so if you find a year
less than 80, add 100 to the century.
 
 
UniqueIdentifier
----------------
 
UniqueIdentifier ::= BITSTRING
 
These were added in X509v2 to handle the possible reuse of subject and/or
issuer names over time.  Their use is deprecated by the IETF.  If you're
writing certificate-handling code, just treat them as a blob which happens to
be an encoded bitstring.
 
No occurrences of a UniqueIdentifier have been reported.
 
 
Extensions
----------
 
Extensions ::= SEQUENCE OF Extension
 
Extension ::= SEQUENCE {
    extnid                  OBJECT IDENTIFIER,
    critical                BOOLEAN DEFAULT FALSE,
    extnValue               OCTETSTRING
    }
 
X.509 certificate extensions are like a LISP property list: an ISO-standardised
place to store crufties.  Extensions can consists of key and policy
information, certificate subject and issuer attributes, certificate path
constraints, CRL distribution points, and private extensions.
 
[There are large numbers of these things.  If people can tell me what they've
 encountered in the past I'll document them]
 
Netscape have quite a few defined for things like CRL URL's and SET defines a
few as well, but none have ever been spotted in the wild.
 
 
Open Issues
-----------
 
The creation of a no-authentication certificate (which is the equivalent of an
unsigned PGP key which states something like "Noone has certified this key, but
here it is anyway") is possible using self-signed certificates.  Technically
this isn't a legal way to certify a key, and Netscape will reject these keys.
It may be possible to provide an equivalent mechanism with a null signature OID
and signature.
 
 
Known Bugs
----------
 
The following bugs are known to exist in X.509 (and related) implementations
from different vendors:
 
Verisign
 
  Their PKCS #10 stuff is wrong.  According to the PKCS #10 spec, the
  attributes field is mandatory, so if it is empty it is encoded as NULL.
  Verisigns software however assumes that if there are no attributes, the field
  should not be present, treated it like an OPTIONAL field.  This is,
  unfortunately, what the example in PKCS #10 does - there are no attributes so
  it leaves them out, contrary to the spec a few pages earlier.
 
SHTTP specification
 
  There is at least one invalid PCKS#7 example in earlier versions of the spec.
  The most recent draft <draft-ietf-wts-shttp-03.txt> July 1996 (Expires
  January-97) fixes this.  Implementors should ensure they are working with the
  latest version of the draft.
 
COST
 
  The lengths of some of the fields in CRL's are broken.  Specifically, the
  lengths of some sequences are calculated incorrectly, so if your code merely
  nods at things like SET and SEQUENCE tags and lengths as they whiz past and
  then works with the individual fields it'll work, but if it tries to work
  with the length values given (for example making sure the lengths of the
  components of a sequence add up correctly) then it'll break.  The sequence
  lengths are longer than the amount of data in the sequence, the COST code may
  be adding the lengths of the elements of the sequence incorrectly (it's a bit
  hard to tell what's going wrong.  Basically the CRL's are broken).
 
Microsoft
 
  Will create certificates with one of the weirder 32-bit wide character string
  types being used to encode all fields in the DN (probably a BMPString
  containing straight Unicode).  While not illegal, it's a truly Microsoft way
  of using ASN.1.
 
Netscape
 
  Invalid encoding of some (but only some) occurences of the integer value 0
  (encoded as integer, length = 0 rather than integer, length = 1, value = 0).
  How they can get it right in one place and then wrong 50 bytes later is a
  puzzle.
 
 
Sample Certificates
-------------------
 
Include some samples:
 
- Basic v1.
- Basic v3 with C, O, OU, and CN in the DN.
- Basic v3 with BER encoding of outer wrapper.
- v3 with other fields in the DN (what would be used?).
- v3 with extensions (what would be used?).
- Microsoft certificates with oddball string types.
 
 
Acknowledgements
----------------
 
Eric Young provided lots of useful information on encoding issues and bugs.