[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: SFL performance problem with encryption and decryption of large messages

I experienced the same problem as you are mentionning: performance issues with
messages of size > 1MB.  After analysis I came to the same conclusion about the
reason why it happened.  For the product that we have been developping,
performance was important too.  My approach to solve the problem was a little
bit different then yours...  What I did is I first pre-allocated the memory in
CSM_DataToEncrypt::Encrypt() since it's easy to calculate the resulting size of
a encrypted message given it's plaintext size (considerating PKCS#5 Padding is
used).  In fact, what you have to do is to round it to next multiple of
BLOCK_SIZE (for a given cipher) or to add BLOCK_SIZE bytes in the case that the
plaintext size is already a multiple of block size (refer yourself to PKCS#5 for
more info).

This "fix" worked really fine for us.  It did improve the performance a lot!!
(all the call related to memory allocation were avoided).  If you are
interested, here is some modifications that I did to the code (we still use a
old version of SFL [1.10]):

===> File:   sm_Encrypt.cpp
===> Class:  CSM_DataToEncrypt
===> Method: Encrypt()
===> Line:   ~1140

void CSM_DataToEncrypt::Encrypt(CSMIME *pCSMIME,
                                    CSM_MsgCertCrls *pMsgCertCrls,
                                    CSM_RecipientInfoLst *pRecipients,
                                    CSM_OID *poidContentType,
                                    CSM_Buffer *pContent,
                                    CSM_Alg *pContentEncryptionAlg,
                                    CSM_Buffer *pOutputBuf)
   // Added to allocate memory first and avoid allocating mem chunk by 8 bytes
   // (the right size is calculated using the new added function
   CSM_Buffer bufEncryptedContent(CalculateResultSize(*pContent,

   CSM_Buffer bufContentParameters, bufMEK;

===> File:   sm_Encrypt.cpp
===> Class:  CSM_DataToEncrypt
===> Method: CalculateResultSize [*NEW*]

// CSM_MsgToEncrypt::CalculateResultSize can be used to precompute the
// size of the buffer that will contain the result of an encryption.
// The Size of Plaintext and the Cipher BlockSize are use to do so.
size_t CSM_DataToEncrypt::CalculateResultSize(CSM_Buffer& rBufPlaintext, int
    // We first get the nb of complete blocks (floor)
    double dNbBlocksNotRounded =
static_cast<double>(static_cast<double>(rBufPlaintext.Length()) /
    int iNbBlocks = static_cast<int>(floor((dNbBlocksNotRounded)));

    // According to PKCS#5 Padding (k - (l mod k)) more bytes are used for the
    iNbBlocks = iNbBlocks + 1;

    return(iNbBlocks * iCipherBlockSize);

P.S.: The code may seems not perfect because the value SM_COMMON_3DES_BLOCKSIZE
is hardcoded but remember that block cipher used in the SMIME v2 standard
operate on a 64 bits block size.

| Simon Blanchet, B.Sc.Comp.Sc.
| Software Designer

-----Original Message-----
From: owner-imc-sfl@xxxxxxxxxxxx [mailto:owner-imc-sfl@xxxxxxxxxxxx]On
Behalf Of John Stark
Sent: Tuesday, April 23, 2002 1:01 PM
To: imc-sfl@xxxxxxx
Cc: Jim Craigie; David Lamkin
Subject: SFL performance problem with encryption and decryption of large

I have written a test suite for our application that uses SFL, and two of
the tests that this performs are encryption and decryption of a 10MB

I originally wrote the test while using SFL version 1.10.  I found that
encryption and decryption were taking an implausibly long time, several
minutes even with a 1MB message, and with a 10MB message it took several

I tracked this problem down to the routine AllocMoreMem() in
SNACC/c++-lib/src/sm_buffer.cpp.  Each time 8 bytes of ciphertext was being
output, this routine was being called.  Each call allocated a new memory
block (zero-initialised), copied the existing data across, added the next 8
bytes, then freed the original block.  This behaviour (assuming CPU usage
time proportional to amount of data copied) resulted in an O(n^2)
degradation with increasing message size n.

In SFL 1.10 I was able to fix this problem.  I changed the code in
AllocMoreMem() from this:

    if ((pNew = (char *)calloc(1, lLength + lSize)) == NULL)
    if (m_pMemory)
       memcpy(pNew, m_pMemory, lLength);

to this:

   // Round up the allocation length - if > 512 bytes but < 64kB then
    // round up to the nearest kB, if >= 64kB then round up to the next
    SM_SIZE_T allocLength = lLength + lSize;
    if (allocLength >= 0x10000)
      allocLength = (allocLength + 0xffff) & ~0xffff;
    else if (allocLength >= 0x200)
      allocLength = (allocLength + 0x3ff) & ~0x3ff;

    // Allocate or extend the memory as necessary.
    if (m_pMemory != NULL)
      pNew = (char *)realloc(m_pMemory, allocLength);
      m_pMemory = NULL;
    else pNew = (char *)malloc(allocLength);

The aim of the rounding up was to ensure that most of the realloc() calls to
add an extra 8 bytes were to the existing size, and effectively a no-op,
which the runtime library optimises to do nothing.  The figures I chose were
an arbitrary compromise between performance and memory wastage.

This fix also worked when applied to the SNACC sm_buffer.cpp source up to
version 3 release 8.  It allowed my 10MB encrypt and decrypt tests to run in
20-30 seconds, which is reasonable for the system I was using (a PIII-500).
Performance appeared to be linear with message size, obviously realloc()
eventually finds a big space in the heap and can then extend repeatedly
without further copying.

However, I have just tried migrating to SNACC version 3 release 10, and the
latest set of SFL 2.0.1 patches.  One of the improvements claimed for this
patch set was improved performance.  AllocMoreMem() in sm_buffer.cpp looks
similar to before, but a lot of the code has been modified and it now
manages some extra class members that weren't referred to in earlier

I tried building my test suite with the sm_buffer.cpp as supplied.  Although
nowhere near as disastrous as with SFL 1.10, performance of the 10MB encrypt
and decrypt tests was poor, taking 2-3 minutes per test.  This is
unacceptable for the product that my customer is about to release.  I
instrumented AllocMoreMem() and found that it still had the "calloc+memcpy
every X bytes" behaviour, but the invoking code had been modified to extend
the memory by 3200 bytes at a time rather than 8, hence the improvement in
performance.  I.e. the code design hadn't been fixed properly, but it had
been tweaked to give OK performance with 1MB messages though not with 10MB

I tried reapplying my "realloc" fix to the new sm_buffer.cpp source.  The
new source also needed a check to not free m_pAllocPtr if it was equal to
m_pMemory which I had just realloc'ed.  However, with this release of SNACC
and SFL, the result of doing this was severely unstable behaviour, with
coredumps occurring consistently, usually in destructors.

I also tried adding code to clear the extra memory added to the buffer to
zeros - which the supplied code does by virtue of its use of calloc() -
though that also needlessly clears the memory that it memcpy's into, and
this enhancement wasn't needed with previous releases.  That made things
more stable, but my test suite still coredumped in the realloc() call during
one of the tests, suggesting corruption of the heap.  It seems that what
I've done interacts in some way with other code.

My final failed attempt at optimising AllocMoreMem() in the current SNACC
release is as attached below.  Can anyone tell me how to fix it so as to
achieve acceptable performance with 10 MB messages?

John Stark
E-mail: jas@xxxxxxxxxxxx
Tel: +44 (0) 1223 566732
Fax: +44 (0) 1223 566727
Mobile: +44 (0) 7968 110628


void CSM_Buffer::AllocMoreMem(SM_SIZE_T lSize)
    char *pNew;
    SM_SIZE_T lLength = Length();


    // Round up the allocation length - if > 512 bytes but < 64kB then
    // round up to the nearest kB, if >= 64kB then round up to the next
    SM_SIZE_T allocLength = lLength + lSize;
    if (allocLength >= 0x10000)
      allocLength = (allocLength + 0xffff) & ~0xffff;
    else if (allocLength >= 0x200)
      allocLength = (allocLength + 0x3ff) & ~0x3ff;

    // Allocate or extend the memory as necessary.
    if (m_pMemory != NULL)
      pNew = (char *)realloc(m_pMemory, allocLength);
      memset(pNew+lLength, 0x0, lSize);
    else pNew = (char *)calloc(1, allocLength);

    // Update the "file pointer".
    m_pMemFP = pNew + lLength;

    // Clear m_pCache if it points at m_pAllocPtr.
    // if m_pAllocPtr is separate from m_pMemory, free it.
    if (m_pAllocPtr != NULL)
       if (m_pCache == (char *)m_pAllocPtr)
         m_pCache = NULL;
       if (m_pAllocPtr != (unsigned char *)m_pMemory) free(m_pAllocPtr);
       m_pAllocPtr = NULL;

    // Free the cache block if separate.
    if (m_pCache)
       m_pCache = NULL;

    // Now set the new values.
    m_lCacheSize = lLength + lSize;    // NEW max memory size.
    m_pMemory = pNew;
    m_pAllocPtr = (unsigned char *)pNew;
    m_pAllocPtrSize = m_lCacheSize;