[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: SFL performance problem with encryption and decryption of large messages


I appologize for these performance problems.  We are aware of the large
file limitations and have plans to improve the overall SNACC buffer
handling for all operations, encoding, decoding and basic read/write to
buffers in general (after R2.1, due late June, 2001).

As a general patch, the sm_buffer.cpp has been updated similar to the
solution suggested by John Stark below.  I had updated the logic to
allocate memory based on 20 X the requested size on allocation as a
performance compromise.  Unfortunately, in my attempts to update the
CSM_Buffer class buffer handling I noticed a number of inconsistencies
when dealing with file and/or memory handling and the memory allocation
mechanisms, so I cannot provide the full list of improvements.  These
attempts destabilized the general SFL logic until I tracked down each
inconsistent use of the CSM_Buffer class writes and updated them (in
general these fixes were due to inconsistent use of the CSM_Buffer API;
most user's will not see these inconsistencies).  This patch contains
ONLY the encrypt/decrypt buffer handling improvements; the next release
will provide a more complete CSM_Buffer update.

I am providing the sm_buffer.cpp file that I have delivered to several
customers as a patch to the SFL R2.0.1; you will require the patch "a"
that should be available on the web site.  If I had seen any indication
of other customer problems with buffer handling, I would have provided
this patch earlier to the general list.  I am listing the
CSM_Buffer::Write(...) update for encryption (this patch was tested with
the SFL R2.0.1 with patch "a"):

<<<< in ./SNACC/c++-lib/src/sm_buffer.cpp
SM_RET_VAL CSM_Buffer::Write(const char *pBuffer, SM_SIZE_T lSize)
   SM_RET_VAL lRet = 0;
   bool appendMode = false;
   bool firstTimeFlag = false;


   long lSizeExtra=lSize;
   if (lSizeExtra < 100000)
      lSizeExtra *= 10;
   if (lSizeExtra < 10000)
      lSizeExtra *= 40;    // Allocate extra memory to avoid re-allocing
                           //  and re-copying memory many-many-many
   if ((pBuffer == NULL) || (lSize <= 0))

Bob Colestock

-----Original Message-----
From: Simon Blanchet [mailto:sblanche@xxxxxxxxxx]
Sent: Tuesday, April 23, 2002 2:53 PM
To: imc-sfl@xxxxxxx; John Stark
Cc: Philippe Leroux; David Lamkin; Jim Craigie
Subject: RE: SFL performance problem with encryption and decryption of
large messages

I experienced the same problem as you are mentionning: performance
issues with
messages of size > 1MB.  After analysis I came to the same conclusion
about the
reason why it happened.  For the product that we have been developping,
performance was important too.  My approach to solve the problem was a
bit different then yours...  What I did is I first pre-allocated the
memory in
CSM_DataToEncrypt::Encrypt() since it's easy to calculate the resulting
size of
a encrypted message given it's plaintext size (considerating PKCS#5
Padding is
used).  In fact, what you have to do is to round it to next multiple of
BLOCK_SIZE (for a given cipher) or to add BLOCK_SIZE bytes in the case
that the
plaintext size is already a multiple of block size (refer yourself to
PKCS#5 for
more info).

This "fix" worked really fine for us.  It did improve the performance a
(all the call related to memory allocation were avoided).  If you are
interested, here is some modifications that I did to the code (we still
use a
old version of SFL [1.10]):

===> File:   sm_Encrypt.cpp
===> Class:  CSM_DataToEncrypt
===> Method: Encrypt()
===> Line:   ~1140

void CSM_DataToEncrypt::Encrypt(CSMIME *pCSMIME,
                                    CSM_MsgCertCrls *pMsgCertCrls,
                                    CSM_RecipientInfoLst *pRecipients,
                                    CSM_OID *poidContentType,
                                    CSM_Buffer *pContent,
                                    CSM_Alg *pContentEncryptionAlg,
                                    CSM_Buffer *pOutputBuf)
   // Added to allocate memory first and avoid allocating mem chunk by 8
   // (the right size is calculated using the new added function
   CSM_Buffer bufEncryptedContent(CalculateResultSize(*pContent,

   CSM_Buffer bufContentParameters, bufMEK;

===> File:   sm_Encrypt.cpp
===> Class:  CSM_DataToEncrypt
===> Method: CalculateResultSize [*NEW*]

// CSM_MsgToEncrypt::CalculateResultSize can be used to precompute the
// size of the buffer that will contain the result of an encryption.
// The Size of Plaintext and the Cipher BlockSize are use to do so.
size_t CSM_DataToEncrypt::CalculateResultSize(CSM_Buffer& rBufPlaintext,
    // We first get the nb of complete blocks (floor)
    double dNbBlocksNotRounded =
static_cast<double>(static_cast<double>(rBufPlaintext.Length()) /
    int iNbBlocks = static_cast<int>(floor((dNbBlocksNotRounded)));

    // According to PKCS#5 Padding (k - (l mod k)) more bytes are used
for the
    iNbBlocks = iNbBlocks + 1;

    return(iNbBlocks * iCipherBlockSize);

P.S.: The code may seems not perfect because the value
is hardcoded but remember that block cipher used in the SMIME v2
operate on a 64 bits block size.

| Simon Blanchet, B.Sc.Comp.Sc.
| Software Designer

-----Original Message-----
From: owner-imc-sfl@xxxxxxxxxxxx [mailto:owner-imc-sfl@xxxxxxxxxxxx]On
Behalf Of John Stark
Sent: Tuesday, April 23, 2002 1:01 PM
To: imc-sfl@xxxxxxx
Cc: Jim Craigie; David Lamkin
Subject: SFL performance problem with encryption and decryption of large

I have written a test suite for our application that uses SFL, and two
the tests that this performs are encryption and decryption of a 10MB

I originally wrote the test while using SFL version 1.10.  I found that
encryption and decryption were taking an implausibly long time, several
minutes even with a 1MB message, and with a 10MB message it took several

I tracked this problem down to the routine AllocMoreMem() in
SNACC/c++-lib/src/sm_buffer.cpp.  Each time 8 bytes of ciphertext was
output, this routine was being called.  Each call allocated a new memory
block (zero-initialised), copied the existing data across, added the
next 8
bytes, then freed the original block.  This behaviour (assuming CPU
time proportional to amount of data copied) resulted in an O(n^2)
degradation with increasing message size n.

In SFL 1.10 I was able to fix this problem.  I changed the code in
AllocMoreMem() from this:

    if ((pNew = (char *)calloc(1, lLength + lSize)) == NULL)
    if (m_pMemory)
       memcpy(pNew, m_pMemory, lLength);

to this:

   // Round up the allocation length - if > 512 bytes but < 64kB then
    // round up to the nearest kB, if >= 64kB then round up to the next
    SM_SIZE_T allocLength = lLength + lSize;
    if (allocLength >= 0x10000)
      allocLength = (allocLength + 0xffff) & ~0xffff;
    else if (allocLength >= 0x200)
      allocLength = (allocLength + 0x3ff) & ~0x3ff;

    // Allocate or extend the memory as necessary.
    if (m_pMemory != NULL)
      pNew = (char *)realloc(m_pMemory, allocLength);
      m_pMemory = NULL;
    else pNew = (char *)malloc(allocLength);

The aim of the rounding up was to ensure that most of the realloc()
calls to
add an extra 8 bytes were to the existing size, and effectively a no-op,
which the runtime library optimises to do nothing.  The figures I chose
an arbitrary compromise between performance and memory wastage.

This fix also worked when applied to the SNACC sm_buffer.cpp source up
version 3 release 8.  It allowed my 10MB encrypt and decrypt tests to
run in
20-30 seconds, which is reasonable for the system I was using (a
Performance appeared to be linear with message size, obviously realloc()
eventually finds a big space in the heap and can then extend repeatedly
without further copying.

However, I have just tried migrating to SNACC version 3 release 10, and
latest set of SFL 2.0.1 patches.  One of the improvements claimed for
patch set was improved performance.  AllocMoreMem() in sm_buffer.cpp
similar to before, but a lot of the code has been modified and it now
manages some extra class members that weren't referred to in earlier

I tried building my test suite with the sm_buffer.cpp as supplied.
nowhere near as disastrous as with SFL 1.10, performance of the 10MB
and decrypt tests was poor, taking 2-3 minutes per test.  This is
unacceptable for the product that my customer is about to release.  I
instrumented AllocMoreMem() and found that it still had the
every X bytes" behaviour, but the invoking code had been modified to
the memory by 3200 bytes at a time rather than 8, hence the improvement
performance.  I.e. the code design hadn't been fixed properly, but it
been tweaked to give OK performance with 1MB messages though not with

I tried reapplying my "realloc" fix to the new sm_buffer.cpp source.
new source also needed a check to not free m_pAllocPtr if it was equal
m_pMemory which I had just realloc'ed.  However, with this release of
and SFL, the result of doing this was severely unstable behaviour, with
coredumps occurring consistently, usually in destructors.

I also tried adding code to clear the extra memory added to the buffer
zeros - which the supplied code does by virtue of its use of calloc() -
though that also needlessly clears the memory that it memcpy's into, and
this enhancement wasn't needed with previous releases.  That made things
more stable, but my test suite still coredumped in the realloc() call
one of the tests, suggesting corruption of the heap.  It seems that what
I've done interacts in some way with other code.

My final failed attempt at optimising AllocMoreMem() in the current
release is as attached below.  Can anyone tell me how to fix it so as to
achieve acceptable performance with 10 MB messages?

John Stark
E-mail: jas@xxxxxxxxxxxx
Tel: +44 (0) 1223 566732
Fax: +44 (0) 1223 566727
Mobile: +44 (0) 7968 110628


void CSM_Buffer::AllocMoreMem(SM_SIZE_T lSize)
    char *pNew;
    SM_SIZE_T lLength = Length();


    // Round up the allocation length - if > 512 bytes but < 64kB then
    // round up to the nearest kB, if >= 64kB then round up to the next
    SM_SIZE_T allocLength = lLength + lSize;
    if (allocLength >= 0x10000)
      allocLength = (allocLength + 0xffff) & ~0xffff;
    else if (allocLength >= 0x200)
      allocLength = (allocLength + 0x3ff) & ~0x3ff;

    // Allocate or extend the memory as necessary.
    if (m_pMemory != NULL)
      pNew = (char *)realloc(m_pMemory, allocLength);
      memset(pNew+lLength, 0x0, lSize);
    else pNew = (char *)calloc(1, allocLength);

    // Update the "file pointer".
    m_pMemFP = pNew + lLength;

    // Clear m_pCache if it points at m_pAllocPtr.
    // if m_pAllocPtr is separate from m_pMemory, free it.
    if (m_pAllocPtr != NULL)
       if (m_pCache == (char *)m_pAllocPtr)
         m_pCache = NULL;
       if (m_pAllocPtr != (unsigned char *)m_pMemory) free(m_pAllocPtr);
       m_pAllocPtr = NULL;

    // Free the cache block if separate.
    if (m_pCache)
       m_pCache = NULL;

    // Now set the new values.
    m_lCacheSize = lLength + lSize;    // NEW max memory size.
    m_pMemory = pNew;
    m_pAllocPtr = (unsigned char *)pNew;
    m_pAllocPtrSize = m_lCacheSize;


Attachment: sm_buffer.zip
Description: sm_buffer.zip