Callgraphs
The Kythe graph contains information about callable objects like functions and ref/call edges that target them. This information can be used to satisfy queries about the locations at which different functions are called:
foo calls bar. (C++)
//- @bar defines/binding FnBar void bar() { } //- @"bar()" ref/call FnBar //- @"bar()" childof FnFoo //- @foo defines/binding FnFoo void foo() { bar(); }
In this example, function foo
makes a single call to function bar
. The
callsite is the expression bar()
. This is spanned by an anchor that has a
ref/call
edge pointing to the function node for bar
. The anchor also has a
childof
edge that points to function foo
. This edge indicates that the
effects of the anchor should be blamed on foo
. There will be up to one such
blame childof
edge with an anchor source and a semantic target.
The assertions in the example trace a possible set of queries made against a Kythe graph:
-
Identify the semantic object for the definition in question (
FnBar
). -
Find the locations at which that object is called. These will be
ref/call
edges withanchor
sources and semantic targets. -
Find the semantic objects associated with each callsite anchor. These will be attached by
childof
edges. -
Finally, we can show the binding location for the caller by looking up its
defines/binding
edge.
These queries will not capture the full set of traditional callsites. We will fix this by first finding calls to forward declarations; then we will consider overrides.
Forward declarations
A call made using a forward declaration may not always ref/call
the definition
that completedby that declaration. This allows the
Kythe graph to model a set of programs (where different programs link against
different implementations for the same declaration) rather than a single program
(where linking different implementations in this way would result in undefined
behavior). In order to build a complete call graph, one must therefore also look
along completedby
edges to find alternate possible nodes that may be used to
make calls to a given declaration or definition.
foo calls a forward-declared bar. (C++)
#include "bardecl.h" //- @"c.bar()" ref/call BarDecl void foo(C& c) { c.bar(); } #example bardecl.h //- @bar defines/binding BarDecl struct C { void bar(); }; #example barimpl.cc #include "bardecl.h" //- @bar defines/binding BarImpl void C::bar() { } // Imagine that we only had BarDecl (by looking up the ref/call from c.bar() // in foo()). We can find the implementation BarImpl of C::bar by taking the // following trip: //- BarDecl completedby BarImpl
Note that this overestimates the possible set of nodes for a given callee: for example, in a given program a forward declaration might never be linked (in the joining-together-object-files sense) with a definition. In order to filter these extra associations out, one would need to consider both which program is in focus by the user and which modules went into building that program. This is out of scope for this document.
completedby
relationships should only be established when an indexer observes
that a completion has actually occurred. This allows users of the graph to
avoid connecting implementations to signatures that those implementations
never observe:
Unrelated defs and decls can be distinguished. (C++)
#example foo1.h // A header that declares a global foo(). //- @foo defines/binding Foo1Decl void foo(); #example foo2.h // An unrelated header that also declares a foo(). //- @foo defines/binding Foo2Decl void foo(); #example foo1.cc #include "foo1.h" // This definition sees the first declaration (but never the second). //- @foo defines/binding Foo1Defn //- Foo1Decl completedby Foo1Defn void foo() { } #example foo2.cc #include "foo2.h" // This definition sees only the second declaration. //- @foo defines/binding Foo2Defn //- Foo2Decl completedby Foo2Defn void foo() { }
Finally, be careful to follow the completedby
relationship in both directions,
since the Kythe graph records specifically which declaration is used in a
function call.
Make sure to reach every callsite. (C++)
#include "foo.h" //- @"foo()"=FooDeclCall ref/call FooDecl void baz() { foo(); } //- @foo=FooImplAnchor defines/binding FooImpl void foo() { } //- @"foo()"=FooImplCall ref/call FooImpl void bar() { foo(); } #example foo.h //- @foo=FooDeclAnchor defines/binding FooDecl void foo(); // Finding all callsites for foo() follows the same query pattern. Depending // on your starting point, different VNames are unknown: if you begin with // FooImplAnchor, you'll need to search forward along the completedby edge for // FooDecl. //- FooDecl completedby FooImpl // If you start with FooDeclAnchor, you first need to find FooDecl, // then you can search backward for all of the nodes that complete it. //- FooImplAnchor defines/binding FooImpl //- FooDeclAnchor defines/binding FooDecl // Then look for all the ref/call edges terminating at those nodes: //- FooDeclCall ref/call FooDecl //- FooImplCall ref/call FooImpl
Overrides
For callsites that involve dynamic binding, Kythe indexers should emit
ref/call
edges to the most specific possible node.
Use the most specific static overrides. (C++)
//- @f defines/binding DefSF struct S { virtual void f() { } }; //- @f defines/binding DefTF //- DefTF overrides DefSF struct T : public S { void f() override { } }; //- @"s->f()" ref/call DefSF void CallSF(S* s) { s->f(); } //- @"t->f()" ref/call DefTF void CallTF(T* t) { t->f(); }
Depending on the semantics of the query you want to make, you may need to walk
up or down the override chain. For example, to find all of the calls to S::f
in the example above, including those calls that are made to functions that
override the implementation in S
:
Follow the override chain downward. (C++)
//- @f defines/binding DefSF struct S { virtual void f() { } }; struct T : public S { void f() override { } }; void CallSF(S* s) { s->f(); } void CallTF(T* t) { t->f(); } // Starting with DefSF, find all callers (and callers of overrides): // Build a set of overrides. //- DefTF overrides DefSF // Find all callsites. //- DefSFCall ref/call DefSF //- DefTFCall ref/call DefTF
Final query
We can now build a single query that will unify defs and decls and search up and
down the override chain. This query starts with some semantic object Fn
and
yields the various anchors that call Fn
in the broadest sense.
-
Add
Fn
to setO
. -
Repeat until fixed point:
-
For every
o
inO
, for all nodeso'
such thato' overrides o
oro overrides o'
, addo'
toO
. -
For every
o
inO
, for all nodeso'
such that (for somea
)a defines/binding o'
and (o completedby o'
), addo'
toO
. -
For every
o
inO
, for all nodeso'
such that (for somea
)a defines/binding o
and (o' completedby o
), addo'
toO
.
-
-
Return the set of all
a
such that there is someo
inO
such thata ref/call o
.
Since this query is expensive in practice, we (intend to) precompute it as part of building serving tables.