A bit about Qt Creator’s C++ model

In the past two and a half years I had the pleasure to develop Qt Creator. This was while Qt was still part of Nokia, which, despite recent events, is a great company to work for. It provided me an enlightening environment and the opportunity to be surrounded by good people.

For those who do not know, Qt Creator is an awesome cross-platform C++/QML (and others) IDE. When it starred in the open-governance model I acted as community maintainer for the text editor and C++ language support, modules which I was particularly involved with. I really enjoyed it!

Creator’s C++ model is pretty good, but it’s not very well documented. Since the topic of compilers is certainly not the most trivial one, this might be frustrating in the beginning. Therefore, I decided to try to give a hand for those who would like to contribute to it at the Qt Project.

What is the C++ model?

Just to make sure we are on the same line: The C++ model is the engine behind pretty much every feature that gives you insight about your code. We are talking about completion, syntactic and semantic highlighting, symbol navigation, class view, type hierarchies, and so on. We are not talking about the compiler setup in your toolchain.

Originally, the C++ model (including the parser) was designed by Roberto Raggi. Over the years several different people have been involved in extending, refactoring, fixing bugs, and generally improving it. If you need any hint or explanation, probably the best way to go is to git blame the code in question and ask for help at #qt-creator IRC channel. You might be lucky. ;-)

If you want to follow closely, please consider grabing Qt Creator’s source code.

The front-end foundations

In order to begin the entire process, the C++ model needs to collect certain “properties” from the project like include paths or defines. This is responsibility of the project manager. Take a look at Qt4ProjectManager::Qt4Project::updateCppCodeModel – other type of projects are handled by their corresponding managers – and search for the call to CppTools::CppModelManagerInterface::updateSourceFiles. This is when the C++ magic begins.

Qt Creator has its own preprocessor, lexer, and parser. (Yes, before I hear someone saying… There’s indeed a wip/clang branch, but hang on for a moment.) The preprocessor has been undergoing a few refactorings lately and it’s now much better. You’ll find it as CPlusPlus::Preprocessor and be aware that it has quite a few tests. Don’t confuse it with CppTools::CppPreprocessor, which is simply the client for the callbacks provided by the actual preprocessing engine.

I guess it’s clear that the C++ model does not run your compiler, right? Good… But notice that it does have an influence over this phase. This is because its predefined macros (when available) are taken into consideration. Therefore different code paths might result from #ifdefs.

A hint: During the indexing of the project, the preprocessor will expand all the macros, but that’s not what happens for files opened in the editor. In this case function-like macros are not expanded because we want to precisely match the editor’s content, and if you trigger a feature like find usages you expect to see the foo in Q_ASSERT(foo) even when the project is in release mode. I actually think it’s possible to accomplish this through an alternative way by tracking the expanded elements. I had an experimental patch for this, but while writing this post I discovered that I lost it. :(

One interesting thing to do would be to print the output of the preprocessor. Then it’s easy to understand what is kept around for the macro expansion scenarios and details about the involved tokens. Furthermore, you’ll realize we also carry information to identify what is “real” and what is not in the editor. Specifically, we use flags for generated or expanded tokens. To have an idea, you could inspect the content of local variable preprocessedCode in CPlusPlus::CppPreprocessor::sourceNeeded and the call to CPlusPlus::Snapshot::preprocessedDocument in CppEditor::SemanticHighlighter::semanticInfo. Check the differences in the resulting content…

Lexing happens after preprocessing, there’s no tight integration between the two – not to be confused with the fact that the preprocessor does use the same lexer internally along with CPlusPlus::Preprocessor::PPToken. Class CPlusPlus::Lexer has a few options for keeping comments, enabling C++11 or Qt specifics. It’s operated by the CPlusPlus::TranslationUnit, which will in turn carry a list of tokens and information about their positions in terms of lines and columns.

The token class, CPlusPlus::Token, is straightforward and tracks offsets, lenghts, and flags. Besides those I mentioned above, there are also flags to indicate whether the token is at the beginning of the line or is joined with the previous one. Literals, which are managed by CPlusPlus::Control, are also annotated on the tokens during lexing.

Another place in which it’s worth to illustrate the role of CPlusPlus::Lexer is in the internals of CPlusPlus::SimpleLexer. The later is used whenever an isolated chunk of code needs to be unravelled. A state may be supplied if we are, for example, in the middle of a comment. A good place to see it working is in the syntax highlighter, inside CppEditor::CppHighlighter::highlightBlock.

Once the tokens are identified, it’s time for syntax analysis. Qt Creator has a predictive recursive descent parser, CPlusPlus::Parser. Given the ambiguity in the C++ grammar and the fact the language is context dependent, the parser also supports backtracking. Error recovery will in general attempt to find an acceptable token for synchronization – in the case of an editor this is particularly important since quite often you would be dealing with an incomplete file.

Qt Creator’s parser doesn’t rely on a symbol table – I’ll discuss more about this, follow on. Therefore, it will not always construct a correct AST (Abstract Syntax Tree). For instance, consider the following sequence of tokens: x * y. In an standard conformant compiler this would be parsed as a declaration in the case the name x is associated to a type and as an expression otherwise. But if you experiment something like void f() { int x, y; x * y; } in Qt Creator you’ll notice it’s parsed as a declaration and you’ll even get a unused warnings. (Here, it could keep both ASTs for a while.)

Don’t expect much if you’re too smart when playing around with header inclusions. Creator will not re-parse your header every time it sees it. Instead, it will just merge the data it has already gather from it. This behaviour allows better scaling and it could somehow be compared to the notion of precompiled headers. Surprises in the use of macros might also appear as a consequence of that, as I point out in the following section.

It’s also possible to request that a particular code chunk is parsed as an expression, declarator, declaration or statement, specifically. This comes quite handy in certain scenarios like code completion, formatting or refactoring operations. There’s a dumper in the test/tools/cplusplus directory which can be used for playing around and inpsecting how an AST looks like.

The AST is the fundamental piece for everything else that comes afterwards. Most of the knowledge about your code is gathered through AST visitors. Just for curiosity, open the type hierarchy of CPlusPlus::ASTVisitor so you can see all the classes that derive from it.

One of the classes that appears in the widget is CPlusPlus::Bind. This is the entity responsible for creating the actual symbols from your code. If you remember what I said above about a symbol table before, it should make sense now: There’s a second pass for the semantic analysis. Navigate to CPlusPlus::Document::check and you should be able to connect the dots.

Let’s have an insight of how the binding happens. As an example, I will pick the overload of CPlusPlus::Bind::visit for CPlusPlus::FunctionDefinitionAST*. The relevant behavior to understand here is that the AST in question is traversed and the grammar components examined. Eventually a type of CPlusPlus::Function gets created and added to its enclosing scope, which is sequentially nested from the top-level namespace. In addition, the symbol is also annotated to the AST.

I hope this overview gives hints of the design choices in Qt Creator’s parser. It’s lightweight, relatively fast, and when possible it deliberately postpone verifications to the semantic phase.

Building the intelligence

The data about your code is all there. It remains to properly understand it so it can be presented in an useful way. A key component in this story is the CPlusPlus::Document. As the name implies, it represents a particular C++ file from a “technical” perspective. It carries around, among others, the source code, the macros usages, the text revision, and a few of the parsing items discussed earlier. Whenever the front-end finishes its tasks on a file it will emit a CPlusPlus::CppModelManagerInterface::documentUpdated signal. This gives an opportunity to capture the newest live version of a document.

As you might guess, one of the consumers of this event is the CppEditor::CPPEditorWidget. With hands on the received document, the C++ editor can now request the CppEditor::SemanticHighlighter to semantically highlight the file. This explains why the source code formatting changes more than once: The first time it’s purely about syntax, as I already described. The following ones will beautify the user defined types, identify class members, select all references for the name currently under the cursor, and other neat things.

The CPlusPlus::Document is essential for a variety of features. Think of the C++ outline that you see in the left pane or the navigation bar on top of the editor… They “simply” map the symbols from the document to a visually appealing form – see CPlusPlus::OverviewModel. What about the type hierarchy? Again, it’s just a visitor to the symbols of the document and its dependants, as CPlusPlus::DerivedHierarchyVisitor shows. Notice that I named the hierarchy for the derived types, because for the base types the approach is different and I’ll come back to it.

OK, the symbols are available. However, they are loose pieces that need to be assembled together in order to enable a pragmatic interpretation of project’s source. This is achieved through an interplay between the CPlusPlus::LookupContext, CPlusPlus::CreateBindings, and CPlusPlus::ClassOrNamespace. But before we delve deeper into those, it would appropriate to introduce what we call the snapshot.

There’s nothing really fancy to tell about CPlusPlus::Snapshot. It’s basically a collection of all C++ documents from the Creator’s session. There’s a main snapshot which is held by the model manager and mutex protected. Most of the time you’ll find yourself working with some copy of it in which you replace an “old” document by a newer (or maybe just modified) version of it. Nevertheless, there’s a tricky detail. It’s relatively common to deal with symbols from different files and to inadvertently forget about their originating documents. Then, suddenly a crash happens due to a dead symbol. Therefore, it’s generally a good idea to always keep the snapshot around until you finish your stuff so the symbols are kept alive.

In the last paragraph, the attentive reader observed that I said session instead of project, when explaining the snapshot. Yes, this is right, Creator doesn’t know the project the symbols belong to. This avoids a lot of duplication (think of library files) but has consequences. For instance, macros will be merged, what might look suspicious when editing files (the content is grayed out) that are part of different projects with different setup. Anyways, it’s practically impossible to handle this in a true sense, unless we create a way for a file to be “viewed as from this project, under those settings, under that platform”.

Back to where we were… The CPlusPlus::LookupContex has a quite meaningful name: It provides the mechanism to search for names and symbols inside certain scope. A supporting visitor, CPlusPlus::CreateBindings, injects classes and functions symbols into their corresponding namespaces (or any other enclosing entity). Properties of classes, like its bases, are also identified. CPlusPlus::ClassOrNamesspace exposes fine grained information about the C++ constructs it’s named after and also contains logic for matching names inside scopes. Thre’s a lot of template instantiation handling there as well.

Concerning the base part from the type hierarchy I previously mentioned, it’s relatively trivial to explain now. In the case of a class, the method CPlusPlus::ClassOrNamespace::usings gives precisely its bases. Consequently, it takes no more than a BFS (Breadth-first-search) to put together a hierarchy. This is exactly what is done in CppEditor::CppClass::lookupBases.

It still remains to talk about relevant components of the C++ model and I believe that code completion suits as a good use case. Supose you press the dot operator immediately after variable hello, which is an object of a class type. Qt Creator will then invoke the code assist API (which is by the way so cool and it runs out of the GUI thread) and eventually reach CppTools::CppCompletionAssistProcessor::perform. Then, the first step is to gather the triggering activation sequence, in this case the dot, and what text cursor position delimits such expression. This is done with pure text inspection, so nothing special.

Latter on we use the delimiting position to identify the entire expression. In order to accomplish that CPlusPlus::ExpressionUndersCursor is invoked. Under the hood it will request the CPlusPlus::BackwardsScanner to “reversely” lex the desired chunk. This doesn’t change lexing itself, but the ability to jump up in the editor for the case the expression originated in a previous line.

Due to my lack of creativity, the spelling hello is found. This UTF-8 encoded string is passed on to the CPlusPlus::TypeOfExpression, which will eventually return a list of candidate symbols to be matched against. This list is, once more, computed by a visitor, CPlusPlus::ResolveExpression (in this case an AST one). The logic is similar to that in the CPlusPlus::LookupContext. If everything goes right, Creator will detect that it needs to propose member’s of hello‘s class type. Perhaps one of them is a method called world. :-)

Although through quite brief descriptions, I think we went over most of the building blocks of the C++ code model. Maybe this helps you to have a better initial overview when familiarizing with the source tree.

The clang branch

There’s currently an effort to migrate Creator’s C++ model (at least partially) to use clang, which certainly has an awesome front-end infra-structure. The story started more or less at this blog post. By the time there was already a few parts implemented (although not complete) and the main question was in concern to the overall performance, in particular for indexing.

I’m not going into details here – specially because I’m out-of-date in the subject, since I (unfortunately) stopped developing Qt Creator. But I guess it should be possible to use the clang version reasonably well for the many features that comprise the C++ editor. If you need more details, the best would be to get in touch with the team.

4 thoughts on “A bit about Qt Creator’s C++ model

  1. Hi,

    Very cool information about Qt Creator, I am planning on trying to contribute to the project, and I found this blog in Google, I will take my shot with the cplusplus project.

    Best Regards.

  2. Thanks for your article. I noted that in your article the module CPlusPlus appear most frequently and found that in the code CppTools module depends on CPlusPlus. I think CPlusPlus is a basic building block which provides infomations about the code for other module. So
    1. what does CppTools do?
    2. I want to extract all functions’ call graph from a project and visualize it. In order to build the graph’s edges , I plan to scan all functions and find out what other functions they invoke. How can I do that efficiently ?

    • Hi dydx, thanks for reading it. :-)

      The CPlusPlus namespace contains essentially a C++ compiler front-end pieces like the preprocessor, lexer, parser, AST (together with visitors), and symbol. This is a standalone component and not necessarily bound to Qt Creator.

      Inside CppTools and CppEditor you’ll find the entities which process the “output” of CPlusPlus and assemble that into a form which is provided by the C++ editor through features like completion. The difference between those two namespaces is, however, more tenuous. Originally, CppEditor would only contain things directly related to the editor itself, while the CppTools would contain algorithms and functionality basically relying on C++ symbol information. Over time things got a bit mixed up since we realised that such distinction (or let’s say, abstraction) was sometimes not possible or would required too much for no apparent benefit. For more than once, there were talks aiming on combining these two plugins into a single one.

      For building the call graph, I guess you need to visit the compound statement of the function in question and follow (eventually resolving) the CallAST expressions.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>