[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

SFL performance problem with encryption and decryption of large messages



I have written a test suite for our application that uses SFL, and two of
the tests that this performs are encryption and decryption of a 10MB
message.

I originally wrote the test while using SFL version 1.10.  I found that
encryption and decryption were taking an implausibly long time, several
minutes even with a 1MB message, and with a 10MB message it took several
hours.

I tracked this problem down to the routine AllocMoreMem() in
SNACC/c++-lib/src/sm_buffer.cpp.  Each time 8 bytes of ciphertext was being
output, this routine was being called.  Each call allocated a new memory
block (zero-initialised), copied the existing data across, added the next 8
bytes, then freed the original block.  This behaviour (assuming CPU usage
time proportional to amount of data copied) resulted in an O(n^2)
degradation with increasing message size n.

In SFL 1.10 I was able to fix this problem.  I changed the code in
AllocMoreMem() from this:

    if ((pNew = (char *)calloc(1, lLength + lSize)) == NULL)
        SME_THROW(SM_MEMORY_ERROR, NULL, NULL);
    if (m_pMemory)
    {
       memcpy(pNew, m_pMemory, lLength);
    }

to this:

   // Round up the allocation length - if > 512 bytes but < 64kB then
    // round up to the nearest kB, if >= 64kB then round up to the next
64kB.
    SM_SIZE_T allocLength = lLength + lSize;
    if (allocLength >= 0x10000)
      allocLength = (allocLength + 0xffff) & ~0xffff;
    else if (allocLength >= 0x200)
      allocLength = (allocLength + 0x3ff) & ~0x3ff;

    // Allocate or extend the memory as necessary.
    if (m_pMemory != NULL)
    {
      pNew = (char *)realloc(m_pMemory, allocLength);
      m_pMemory = NULL;
    }
    else pNew = (char *)malloc(allocLength);
    if (pNew == NULL) SME_THROW(SM_MEMORY_ERROR, NULL, NULL);

The aim of the rounding up was to ensure that most of the realloc() calls to
add an extra 8 bytes were to the existing size, and effectively a no-op,
which the runtime library optimises to do nothing.  The figures I chose were
an arbitrary compromise between performance and memory wastage.

This fix also worked when applied to the SNACC sm_buffer.cpp source up to
version 3 release 8.  It allowed my 10MB encrypt and decrypt tests to run in
20-30 seconds, which is reasonable for the system I was using (a PIII-500).
Performance appeared to be linear with message size, obviously realloc()
eventually finds a big space in the heap and can then extend repeatedly
without further copying.

However, I have just tried migrating to SNACC version 3 release 10, and the
latest set of SFL 2.0.1 patches.  One of the improvements claimed for this
patch set was improved performance.  AllocMoreMem() in sm_buffer.cpp looks
similar to before, but a lot of the code has been modified and it now
manages some extra class members that weren't referred to in earlier
versions.

I tried building my test suite with the sm_buffer.cpp as supplied.  Although
nowhere near as disastrous as with SFL 1.10, performance of the 10MB encrypt
and decrypt tests was poor, taking 2-3 minutes per test.  This is
unacceptable for the product that my customer is about to release.  I
instrumented AllocMoreMem() and found that it still had the "calloc+memcpy
every X bytes" behaviour, but the invoking code had been modified to extend
the memory by 3200 bytes at a time rather than 8, hence the improvement in
performance.  I.e. the code design hadn't been fixed properly, but it had
been tweaked to give OK performance with 1MB messages though not with 10MB
ones.

I tried reapplying my "realloc" fix to the new sm_buffer.cpp source.  The
new source also needed a check to not free m_pAllocPtr if it was equal to
m_pMemory which I had just realloc'ed.  However, with this release of SNACC
and SFL, the result of doing this was severely unstable behaviour, with
coredumps occurring consistently, usually in destructors.

I also tried adding code to clear the extra memory added to the buffer to
zeros - which the supplied code does by virtue of its use of calloc() -
though that also needlessly clears the memory that it memcpy's into, and
this enhancement wasn't needed with previous releases.  That made things
more stable, but my test suite still coredumped in the realloc() call during
one of the tests, suggesting corruption of the heap.  It seems that what
I've done interacts in some way with other code.

My final failed attempt at optimising AllocMoreMem() in the current SNACC
release is as attached below.  Can anyone tell me how to fix it so as to
achieve acceptable performance with 10 MB messages?

John Stark
E-mail: jas@xxxxxxxxxxxx
Tel: +44 (0) 1223 566732
Fax: +44 (0) 1223 566727
Mobile: +44 (0) 7968 110628

--------------------

void CSM_Buffer::AllocMoreMem(SM_SIZE_T lSize)
{
    char *pNew;
    SM_SIZE_T lLength = Length();

    SME_SETUP("CSM_Buffer::AllocMoreMem");

    // Round up the allocation length - if > 512 bytes but < 64kB then
    // round up to the nearest kB, if >= 64kB then round up to the next
64kB.
    SM_SIZE_T allocLength = lLength + lSize;
    if (allocLength >= 0x10000)
      allocLength = (allocLength + 0xffff) & ~0xffff;
    else if (allocLength >= 0x200)
      allocLength = (allocLength + 0x3ff) & ~0x3ff;

    // Allocate or extend the memory as necessary.
    if (m_pMemory != NULL)
    {
      pNew = (char *)realloc(m_pMemory, allocLength);
      memset(pNew+lLength, 0x0, lSize);
    }
    else pNew = (char *)calloc(1, allocLength);
    if (pNew == NULL) SME_THROW(SM_MEMORY_ERROR, NULL, NULL);

    // Update the "file pointer".
    m_pMemFP = pNew + lLength;

    // Clear m_pCache if it points at m_pAllocPtr.
    // if m_pAllocPtr is separate from m_pMemory, free it.
    if (m_pAllocPtr != NULL)
    {
       if (m_pCache == (char *)m_pAllocPtr)
         m_pCache = NULL;
       if (m_pAllocPtr != (unsigned char *)m_pMemory) free(m_pAllocPtr);
       m_pAllocPtr = NULL;
    }

    // Free the cache block if separate.
    if (m_pCache)
    {
       free(m_pCache);
       m_pCache = NULL;
    }

    // Now set the new values.
    m_lCacheSize = lLength + lSize;    // NEW max memory size.
    m_pMemory = pNew;
    m_pAllocPtr = (unsigned char *)pNew;
    m_pAllocPtrSize = m_lCacheSize;

    SME_FINISH_CATCH
}