Phases of Translation

I recently came across a mention about translation when reading about constexpr. Namely that the value of a constexpr objects (constexpr functions are a bit different) can be evaluated during translation. Hence the inspiration to dig a bit deeper. If you’re like me, you know about the inputs of translation and it’s final output, but what about the different steps?

To rewind, a program consist of one or more translation units which are linked together. What’s included in a translation unit? It includes a source file, its headers, and everything else added by #include directives. How we turn the raw lines of code we programmers write, into these translation units is defined in 9 “easy” steps in the standard [lex.phases / 2.2].

Step 1: The source file characters are mapped to the basic source character set.

Step 2: The last backslash of every line (followed by a new -line character) are deleted and the 2 lines are spliced into a single logical line.

// String literals on 2 lines
string sourceStr = "ABC" \
                   "123";

// Becomes something like this
string sourceStr = "ABC" "123";

Step 3: The source file is converted to preprocessing tokens, and comments are replaced by a one space character.

NOTE: so far, we have only dealt with the source file.

Step 4: Up until now, we have only dealt with the source file. In this step the preprocessing directives are executed, macros are expanded, and _Pragmas are executed. A source are header file named in #include directives are processed as per steps 1-4 recursively and the tokens added to the source file. The #include directives are then removed.

Step 5: Characters in string literals are converted to the corresponding execution character set.

Step 6: Adjacent string literals tokens are concatenated.

// Translated in phase 6 to "ABC123"
string testStr =  "ABC" "123";

Step 7: The preprocessing tokens are converted into tokens (There are 5 kinds of tokens: identifiers, keywords, literals, operators, and other separators). These tokens are analyzed and translated into the translation unit.

Step 8: Here we find the list of required instantiations in the translation unit. The template definitions are located, the instantiations are performed, and the result is the instantiation unit. The translation unit and the instantiation unit are combined.

Step 9: All the external references which are not defined in the current translation are resolved. This output is collected into a program image which contains everything needed for it’s execution.

Final thoughts: Well there you have it, the translations phases. At the end of step 9, when the external references are resolved, the constexpr object can be determined.

Leave a Reply

Your email address will not be published. Required fields are marked *