Kythe - Callgraphs

Forward declarations
Overrides
Final query

The Kythe graph contains information about callable objects like functions and ref/call edges that target them. This information can be used to satisfy queries about the locations at which different functions are called:

foo calls bar. (C++)

//- @bar defines/binding FnBar
void bar() { }
//- @"bar()" ref/call FnBar
//- @"bar()" childof FnFoo
//- @foo defines/binding FnFoo
void foo() { bar(); }

In this example, function foo makes a single call to function bar. The callsite is the expression bar(). This is spanned by an anchor that has a ref/call edge pointing to the function node for bar. The anchor also has a childof edge that points to function foo. This edge indicates that the effects of the anchor should be blamed on foo. There will be up to one such blame childof edge with an anchor source and a semantic target.

The assertions in the example trace a possible set of queries made against a Kythe graph:

Identify the semantic object for the definition in question (FnBar).
Find the locations at which that object is called. These will be ref/call edges with anchor sources and semantic targets.
Find the semantic objects associated with each callsite anchor. These will be attached by childof edges.
Finally, we can show the binding location for the caller by looking up its defines/binding edge.

These queries will not capture the full set of traditional callsites. We will fix this by first finding calls to forward declarations; then we will consider overrides.

Forward declarations

A call made using a forward declaration may not always ref/call the definition that completedby that declaration. This allows the Kythe graph to model a set of programs (where different programs link against different implementations for the same declaration) rather than a single program (where linking different implementations in this way would result in undefined behavior). In order to build a complete call graph, one must therefore also look along completedby edges to find alternate possible nodes that may be used to make calls to a given declaration or definition.

foo calls a forward-declared bar. (C++)

#include "bardecl.h"
//- @"c.bar()" ref/call BarDecl
void foo(C& c) { c.bar(); }

#example bardecl.h
//- @bar defines/binding BarDecl
struct C { void bar(); };

#example barimpl.cc
#include "bardecl.h"
//- @bar defines/binding BarImpl
void C::bar() { }

// Imagine that we only had BarDecl (by looking up the ref/call from c.bar()
// in foo()). We can find the implementation BarImpl of C::bar by taking the
// following trip:
//- BarDecl completedby BarImpl

Note that this overestimates the possible set of nodes for a given callee: for example, in a given program a forward declaration might never be linked (in the joining-together-object-files sense) with a definition. In order to filter these extra associations out, one would need to consider both which program is in focus by the user and which modules went into building that program. This is out of scope for this document.

completedby relationships should only be established when an indexer observes that a completion has actually occurred. This allows users of the graph to avoid connecting implementations to signatures that those implementations never observe:

Unrelated defs and decls can be distinguished. (C++)

#example foo1.h
// A header that declares a global foo().
//- @foo defines/binding Foo1Decl
void foo();
#example foo2.h
// An unrelated header that also declares a foo().
//- @foo defines/binding Foo2Decl
void foo();
#example foo1.cc
#include "foo1.h"
// This definition sees the first declaration (but never the second).
//- @foo defines/binding Foo1Defn
//- Foo1Decl completedby Foo1Defn
void foo() { }
#example foo2.cc
#include "foo2.h"
// This definition sees only the second declaration.
//- @foo defines/binding Foo2Defn
//- Foo2Decl completedby Foo2Defn
void foo() { }

Finally, be careful to follow the completedby relationship in both directions, since the Kythe graph records specifically which declaration is used in a function call.

Make sure to reach every callsite. (C++)

#include "foo.h"
//- @"foo()"=FooDeclCall ref/call FooDecl
void baz() { foo(); }
//- @foo=FooImplAnchor defines/binding FooImpl
void foo() { }
//- @"foo()"=FooImplCall ref/call FooImpl
void bar() { foo(); }
#example foo.h
//- @foo=FooDeclAnchor defines/binding FooDecl
void foo();

// Finding all callsites for foo() follows the same query pattern. Depending
// on your starting point, different VNames are unknown: if you begin with
// FooImplAnchor, you'll need to search forward along the completedby edge for
// FooDecl.
//- FooDecl completedby FooImpl

// If you start with FooDeclAnchor, you first need to find FooDecl,
// then you can search backward for all of the nodes that complete it.
//- FooImplAnchor defines/binding FooImpl
//- FooDeclAnchor defines/binding FooDecl

// Then look for all the ref/call edges terminating at those nodes:
//- FooDeclCall ref/call FooDecl
//- FooImplCall ref/call FooImpl

Overrides

For callsites that involve dynamic binding, Kythe indexers should emit ref/call edges to the most specific possible node.

Use the most specific static overrides. (C++)

//- @f defines/binding DefSF
struct S { virtual void f() { } };
//- @f defines/binding DefTF
//- DefTF overrides DefSF
struct T : public S { void f() override { } };
//- @"s->f()" ref/call DefSF
void CallSF(S* s) { s->f(); }
//- @"t->f()" ref/call DefTF
void CallTF(T* t) { t->f(); }

Depending on the semantics of the query you want to make, you may need to walk up or down the override chain. For example, to find all of the calls to S::f in the example above, including those calls that are made to functions that override the implementation in S:

Follow the override chain downward. (C++)

//- @f defines/binding DefSF
struct S { virtual void f() { } };
struct T : public S { void f() override { } };
void CallSF(S* s) { s->f(); }
void CallTF(T* t) { t->f(); }

// Starting with DefSF, find all callers (and callers of overrides):
// Build a set of overrides.
//- DefTF overrides DefSF
// Find all callsites.
//- DefSFCall ref/call DefSF
//- DefTFCall ref/call DefTF

Final query

We can now build a single query that will unify defs and decls and search up and down the override chain. This query starts with some semantic object Fn and yields the various anchors that call Fn in the broadest sense.

Add Fn to set O.
Repeat until fixed point:
1. For every o in O, for all nodes o' such that o' overrides o or o overrides o', add o' to O.
2. For every o in O, for all nodes o' such that (for some a) a defines/binding o' and (o completedby o'), add o' to O.
3. For every o in O, for all nodes o' such that (for some a) a defines/binding o and (o' completedby o), add o' to O.
Return the set of all a such that there is some o in O such that a ref/call o.

Since this query is expensive in practice, we (intend to) precompute it as part of building serving tables.