Literally: string literals – part 1

String literals is a topic that often comes up when writing critical sections of code. Usually most developers don’t care too much about literals, since in general, they just work. But then you might do some profiling and realize you’re doing excessive amounts of calls to string related functions.

Ordinary string literals (aka narrow string literals) are an array of n const char. A string literal also has static storage duration. A reason why they need to be “const”, is that the standard hints that they may be optimized. Meaning they shouldn’t be modified:


Whether all string literals are distinct (that is, are stored in nonoverlapping objects) and whether successive evaluations of a string-literal yield the same or a different object is unspecified. [ Note: The effect of attempting to modify a string literal is undefined. —end note “

standard [lex.string])

This means 2 different variables initialized with the same string might point to the same memory space. But could also could mean strings “ABC” and “ABCDEF” would share the same “ABC”. Hence why modifying the any part of the string undesirable to say the least.

As part of the phases of translation, multi-line adjacent strings are concatenated:

const char* multi_line_string = "First line "
				                "second line.";

becomes this:"First line second line."

And finally, once all the concatenations have occurred, a null termination character ‘\0’ is added to ensure programs can identify the end of the string literals.

"First line second line.\0"

Before we go any further and this being C++, it’s time to talk types.

“ABC” is of type const char[4]. It will decay to a char*. This is an important distinction when we refer to string literals. Again, a char array will implicitly decay to a char pointer. There are some implications:

  • char str_arr[] = “Hello String”:
    • memory: allocated on the stack by default. Will create an array with each character of the string literal.
    • constness: can modify chars in the array.
  • char* str_ptr = “Hello String”
    • memory: can point to static duration string literals.
    • constness: Undefined behavior if you try to modify the const underlying data.

This has implication for determining the length of the string.

With a char*, you can use strlen. Which will look for the null termination and return the length it calculated. With an array of char, the length of the array is known at compile time. We can use templates to bind to that info:

template< size_t N >
constexpr size_t length(const char(& myarr)[N])
{
	return N - 1;
}

There’s a bit to pick apart here. But, to get back to where we started, let’s remember the type:

“ABC” is of type const char[4]

And

“ABCD” is of type const char[5].

This means, during template instantiation, the compiler would create a new template for each type, with the length already “baked” in the function. For example:

template<4>
constexpr size_t length(const char(& p_mystr)[4])
{
	return 4 - 1;
}

char mystr[] = “ABC”;
size_t len = length(mystr);

The slightly odd parameter syntax is that we are passing the char array by reference. The parameter name is necessary.

Since the type is also constexpr, it is calculated entirely at compile time. Looking at the assembly code for gcc ver.9.1 with the use of constexpr, the length value (in the case of the string “ABC” being 3) is moved directly into the register. You can view the assembly code generated below using gcc:

You can view the code here which sets the optimizer to -O3 for both gcc and clang.

That’s it for part 1 now. More on literals and literal operators next time.

Leave a Reply

Your email address will not be published. Required fields are marked *