Reversing C++ can be a pain, and part of the reason for that is that in C++ a std::string
can be dynamically-sized. This means its appearance in memory is more complex than a char[]
that you would find in C, because std::string
actually contains 3 fields:
Pointer to the allocated memory (the actual string itself)
Logical size of string
Size of allocated memory (which must be bigger than or equal to logical size)
The actual string content is dynamically allocated on the heap. As a result, std::string
looks something like this in memory:
This is not necessarily a consistent implementation, which is why many decompilers don't recognise strings immediately - they can vary between compilers and different versions.
Decompilers can confuse us even more depending on how they optimise small objects. Simply put, we would prefer to avoid allocating space on the heap unless absolutely necessary, so if the string is short enough, we try to fit it within the std::string
struct itself. For example:
In this example, if the string is 8 bytes or less, local_buf
is used and the string is stored there instead. buf
will then point at local_buf
, and no heap allocation is used.
An analysis of different compilers' approaches to Small Object Optimization can be found here.