Strings in C++

Basic Strings

Reversing C++ can be a pain, and part of the reason for that is that in C++ a std::string can be dynamically-sized. This means its appearance in memory is more complex than a char[] that you would find in C, because std::string actually contains 3 fields:

  • Pointer to the allocated memory (the actual string itself)

  • Logical size of string

  • Size of allocated memory (which must be bigger than or equal to logical size)

The actual string content is dynamically allocated on the heap. As a result, std::string looks something like this in memory:

class std::string
{
    char* buf;
    size_t len;
    size_t allocated_len;
};

This is not necessarily a consistent implementation, which is why many decompilers don't recognise strings immediately - they can vary between compilers and different versions.

Small Object Optimization

Decompilers can confuse us even more depending on how they optimise small objects. Simply put, we would prefer to avoid allocating space on the heap unless absolutely necessary, so if the string is short enough, we try to fit it within the std::string struct itself. For example:

class std::string
{
    char* buf;
    size_t len;
    
    // union is used to store different data types in the same memory location
    // this saves space in case only one of them is necessary 
    union
    {
        size_t allocated_len;
        char local_buf[8];
    };
};

In this example, if the string is 8 bytes or less, local_buf is used and the string is stored there instead. buf will then point at local_buf, and no heap allocation is used.

An analysis of different compilers' approaches to Small Object Optimization can be found here.

Last updated