Kythe - Kythe Schema Reference

Kythe Namespace
VName conventions
Edge kinds
Common node facts
Node kinds
Variance
Language-specific rules

Tip

This document is part of the Kythe test suite.

Successfully generating this document is part of the test suite for the Kythe indexers. The assertions in the example code all verify and the graphs provided are the graphs that are actually output. Feel free to add examples from your own languages, but be sure to keep them up to date.

Kythe Namespace

All fact names in this doc are implicitly prefixed by /kythe: e.g. node/kind is really /kythe/node/kind
All edge kinds in this doc are implicitly prefixed by /kythe/edge: e.g. aliases is really /kythe/edge/aliases

The verifier will automatically prepend the respective prefix unless the fact name or edge kind starts with /.

VName conventions

By default, assume that the VNames of nodes should be chosen according to the following rules (more details are given in the storage model):

language: the source language.
corpus: the node’s containing corpus.
root: a root path relative to the node’s corpus.
path: a path relative to the corpus and root of the node.
signature: a unique string (per corpus, root, path, and language) that should be consistently generated given the same input to the indexer, but that does not necessarily need to be stable across different versions of the input.

Additional rules govern the generation of VNames for certain kinds of nodes, most notably files. These nodes are frequently used as points for linking together the output of discrete indexer runs and may have greater stability properties than may be derived using the preceding VName rules.

Absent additional rules, an indexer is permitted to encode the signature field arbitrarily, as long as the chance of this encoding causing distinct field values to become indistinguishable is vanishingly small. This is meant to permit implementations to use one-way hash functions to crunch large signature values down to manageable fingerprints.

Edge kinds

Tip	Reverse Edges The Kythe API uses `%` to denote a reverse edge. For example, if NodeA `defines/binding` NodeB, then NodeB `%/kythe/edge/defines/binding` NodeA. Reverse edges are constructed during post-processing, and should not be emitted by indexers.

aliases

Brief description: A aliases T if A may be used in place of T.
Commonly arises from: typedefs, imports
Points from: [talias]
Points toward: types
Ordinals are used: never
See also: [aliases/root]

Typedefs are aliases. (C++)

//- @Counter defines/binding TAlias
//- TAlias aliases TInt
typedef int Counter;

aliases/root

Brief description: A aliases/root T if following all aliases edges from A may lead to T, possibly with language-specific qualifiers applied.
Commonly arises from: typedefs
Points from: [talias]
Points toward: types
Ordinals are used: never
See also: [aliases/root]

Following aliases. (C++)

//- @T defines/binding AliasT
//- AliasT aliases TInt
//- AliasT aliases/root TInt
using T = int;
//- AliasS aliases AliasT
//- AliasS aliases/root TInt
using S = T;
//- AliasU aliases AliasS
//- AliasU aliases/root TInt
using U = S;

Following aliases and collecting qualifiers. (C++)

//- @CInt defines/binding ConstIntAlias
//- ConstIntAlias aliases CInt
//- CInt.node/kind tapp
//- CInt param.0 Const
//- CInt param.1 TInt
using CInt = const int;
//- @T defines/binding AliasT
//- AliasT aliases TInt
//- AliasT aliases/root TInt
using T = int;
//- AliasS aliases ConstAliasT
//- ConstAliasT param.0 Const
//- ConstAliasT param.1 AliasT
//- AliasS aliases/root CInt
using S = const T;
//- AliasU aliases AliasS
//- AliasU aliases/root CInt
using U = S;

annotatedby

Brief description: A annotatedby B if A provides metadata for B.
Points from: semantic nodes
Points toward: semantic nodes
Ordinals are used: never

Classes can be annotated. (Java)

//- @Deprecated ref Deprecated
//- @E defines/binding Class
//- Class annotatedby Deprecated
@Deprecated public class E {}

bounded/upper or bounded/lower

Brief description

[tvar] A is bounded/upper by B when A is constrained to be a subtype of B. [tvar] A is bounded/lower by B when A is constrained to be a supertype of B.

See also

[tvar]

Commonly arises from

Type parameters

Notes

Kythe leaves the interpretation of unbounded [tvar] nodes to each language. For example, an [tvar] with no bounded edges in Java may be assigned any subtype of Object, but no primitive type.
It is possible for an [tvar] to have multiple bounds. For example, this occurs in Java when a type parameter must implement several interfaces. In cases where the order of the bounds matters (e.g., in Java, where the order affects type erasure), the bound edge kind may be qualified by an ordinal, so that A is bounded/upper.N by B if B is the Nth upper bound of A. Code interpreting bounded edges should be able to handle both ordered and unordered edges.

Generic type parameters can be bound. (Java)

package pkg;
import java.util.Optional;
public class E {
  //- @"Optional<?>" ref OptionalWild
  //- OptionalWild.node/kind tapp
  //- OptionalWild param.0 OptionalClass
  //- OptionalWild param.1 Wildcard0
  //- Wildcard0.node/kind tvar
  //- !{ Wildcard0 bounded/upper Anything0
  //-    Wildcard0 bounded/lower Anything1 }
  private static void wildcard(Optional<?> o) {}

  //- @"Optional<? extends String>" ref OptionalWildString
  //- OptionalWildString.node/kind tapp
  //- OptionalWildString param.0 OptionalClass
  //- OptionalWildString param.1 Wildcard1
  //- Wildcard1.node/kind tvar
  //- Wildcard1 bounded/upper Str
  //- @String ref Str
  //- !{ Wildcard1 bounded/lower Anything2 }
  private static void wildcardBound(Optional<? extends String> o) {}

  //- @"Optional<? super String>" ref OptionalWildSuperString
  //- OptionalWildSuperString.node/kind tapp
  //- OptionalWildSuperString param.0 OptionalClass
  //- OptionalWildSuperString param.1 WildcardSuper1
  //- WildcardSuper1.node/kind tvar
  //- WildcardSuper1 bounded/lower Str
  //- !{ WildcardSuper1 bounded/upper Anything1 }
  //- @String ref Str
  private static void wildcardSuperBound(Optional<? super String> o) {}

  //- @objAndOneIFaceBound defines/binding OIFunc
  //- @S1 defines/binding S1Var
  //- @List ref List
  //- S1Var bounded/upper.0 Obj
  //- S1Var bounded/upper.1 List
  public <S1 extends Object & java.util.List> void objAndOneIFaceBound() {}

}

childof

Brief description: A childof B if A is contained in or dominated by B.
Commonly arises from: [anchor]s, block syntax, membership
Points from: any
Points toward: semantic nodes
Ordinals are used: never
Notes: [anchor]s should not be childof the [file] in which they reside. As an optimization due to the overwhelming number of such edges, the parentage relationship between these nodes is determined by the shared corpus, path, and root VName fields.

Enumerators are children of enumerations. (C++)

//- @Enum defines/binding Enumeration
enum class Enum {
//- @Etor defines/binding Enumerator
  Etor
};
//- Enumerator childof Enumeration

childof/context

Brief description: A childof/context T if anchor A is associated with some instantiation T.
Commonly arises from: template instantiations
Points from: [anchor]s
Points toward: semantic nodes
Ordinals are used: never

Template instantiations create new anchor contexts. (C++)

//- @C defines/binding CTemplate
//- CTemplateBody childof CTemplate
template <typename T> struct C {
//- @x=AnchorX defines/binding XBinding
//- @x=AnchorCX defines/binding CXBinding
  int x;
};
//- @C defines/binding CInst
//- AnchorCX childof/context CInst
//- !{ AnchorX childof/context CInst }
template struct C<int>;

completedby

Brief description: Declaration A completedby definition B if B fully specifies A. There may exist other definitions that may also fully specify A.
Commonly arises from: definitions of forward declarations
Points from: semantic nodes with complete facts set to incomplete or complete
Points toward: semantic nodes with complete facts set to complete
See also: [record], [sum]

Declarations are completed by definitions. (C++)

#include "test.h"
//- Decl1 completedby Defn
//- Decl2 completedby Defn
//- @C defines/binding Defn
class C { };

#example test.h
//- @C defines/binding Decl1
class C;
//- @C defines/binding Decl2
class C;

defines

Brief description: A defines B if A generates the semantic object B.
Commonly arises from: definitions and declarations
Points from: anchors
Points toward: semantic nodes
Ordinals are used: never
See also: [defines/binding]
Notes: It is valid for multiple anchors to define the same semantic object. These anchors may even overlap.

Class definitions span their entire body. (Java)

//- ClassEDef defines ClassE
//- ClassEDef.node/kind anchor
//- ClassEDef.loc/start @^public
//- ClassEDef.loc/end @$+3"}"
public class E {
  // class contents here...
}

Method definitions span their entire body. (Java)

public class E {
  //- MethodDef defines Method
  //- MethodDef.node/kind anchor
  //- MethodDef.loc/start @^public
  //- MethodDef.loc/end @$+3"}"
  public int methodName(int param) {
    return 42;
  }
}

defines/binding

Brief description: A defines/binding B when A covers an identifier bound to B when that binding is established.
Commonly arises from: definitions
Points from: anchors
Points toward: semantic nodes
Ordinals are used: never
See also: [defines]
Notes: Source anchors are not necessarily identifiers. For example, the C++ indexer will start a defines/binding edge from an anchor spanning the text operator().

Class names bind their definitions. (Java)

//- @E defines/binding ClassE
public class E {}

Method names bind their definitions. (Java)

public class E {
  //- @main defines/binding MethodMain
  public static void main(String[] args) {}
}

Variable definitions define bindings for variables. (C++)

//- @x defines/binding VariableX
int x;

defines/implicit

Brief description: A defines/implicit B if A semantically defines an entity B, but that definition is not explicitly related to the decorated text.
Commonly arises from: filename-based packages, C++ special member functions
Points from: anchors
Points toward: semantic nodes
See also: [package]
Notes: The most common situations in which this edge should be used are in languages for which modules don’t have an explicit syntactic marker, deriving the module name from the file or directory path and C++'s compiler-defined special member functions.

Implicit destructor definition. (C++)

///- @X defines/implicit XDtor
///- XDtor.subkind destructor
class X {};

Implicit module definition. (clike)

///- Mod=vname(_, _, _, _, _).node/kind package
///- ModAnchor.node/kind anchor
///- ModAnchor./kythe/loc/start 0
///- ModAnchor./kythe/loc/end 0
///- ModAnchor defines/implicit Mod

denotes

Brief description

A denotes B if A is a concrete representation of B, an abstract representation that may not be written down.

Commonly arises from

library support code (flags, database wrappers, etc)

Points from

semantic nodes with a definition

Points toward

semantic nodes

Ordinals are used

never

See also

[completedby], [generates]

Notes

In contrast to generates, where it is expected that both source and target nodes have definition locations, only the source of a denotes edge must have a definition location. The source (or concrete representation) is used as the canonical definition of the target (or abstract representation).

If the source of a *denotes* edge participates in a completion
relationship, that source should have completion type `definition`.
It is not necessary for the target to also participate in a
completion relationship or to have a completion fact set.

It is not necessary for the source and target to share the same
node kind.

depends

Brief description: A depends B if processing of A depends on the existence or presence of B. For example, if a process depends a set of files and/or processes. Another example, a file depends upon a process which outputs said file.
Commonly arises from: build dependencies
Points from: [process], [file]
Points toward: [process], [file]
Ordinals are used: never

documents

Brief description: A documents B if A describes (in possibly marked up natural language) the semantic object B.
Commonly arises from: documentation comments
Points from: anchors and [doc]s
Points toward: semantic nodes
Ordinals are used: never
Notes: Kythe does not specify a particular flavor of markup. Documentation comment anchors include all of the characters of the comment, including (e.g.) the ///s. It is up to the language indexer to determine which comments to treat as documentation comments.
See also: refdoc

In the C++ example below, there are really two documentation blocks: the first comes from merging together the verification annotations; the second is the Doxygen-style /// line. The Doxygen line is not merged with the verifier lines owing to heuristics in Clang’s comment parser.

Comments document objects. (C++)

int v;   //- @"/// An empty class." documents ClassC
         //- ClassC.node/kind record
/// An empty class.
class C { };

exports

Brief description: A exports B if process node A exports process node B.
Commonly arises from: build rules
Points from: process nodes
Points toward: process nodes
Orginals are used: never
See also: [process]
Notes: Tools like bazel have cases where build rules export other build rules, wherein the closure of all rules reached via exports attributes are considered direct dependencies of any rule that directly depends on the target with exports. The exports are not direct dependencies of the rule they belong to.

Process node exports other process node. (clike)

//- ProcessNodeA.node/kind process
//- ProcessNodeB.node/kind process
//- ProcessNodeA exports ProcessNodeB
java_library(
  name = "A",
  exports = [
    ":B",
  ],
)

java_library(
  name = "B",
)

extends

Brief description: A extends B if A explicitly derives from B. It neither implies nor excludes any type relationship between A and B.
Commonly arises from: inheritance
Points from: semantic nodes
Points toward: type/semantic nodes
Ordinals are used: never
Notes: An indexer may emit more descriptive edges with the extends prefix. For example, C++ will emit extends/public, extends/public/virtual, extends/protected, extends/protected/virtual, extends/private, extends/private/virtual, and extends/virtual.

Classes extend classes. (Java)

package pkg;
public class E {
  //- @A defines/binding ClassA
  static class A { }
  //- @B defines/binding ClassB
  //- ClassB extends ClassA
  static class B extends A { }
}

Classes and interfaces extend interfaces. (Java)

package pkg;
//- @I defines/binding IntfI
interface I { }

//- @J defines/binding IntfJ
//- IntfJ extends IntfI
interface J extends I { }

//- @C defines/binding ClassC
//- ClassC extends IntfJ
//- !{ClassC extends IntfI}
class C implements J { }

Classes extend classes. (C++)

//- @A defines/binding ClassA
class A { };

//- @B defines/binding ClassB
//- ClassB extends/public ClassA
class B : public A { };

//- @C defines/binding ClassC
//- ClassC extends/private ClassA
class C : private A { };

generates

Brief description: A generates B if A is related to B through some extralingual process.
Commonly arises from: code generation
Points from: semantic nodes, files
Points toward: semantic nodes, files
Ordinals are used: never
See also: [denotes], [imputes], [semanticgenerated]

Tools like RPC interface generators read specification languages and emit code in one or more target languages. Although the specification language and target languages do not share Kythe indexers, it is still semantically useful to connect the nodes they emit. For example, one might want to list all the C++ and Java uses of a particular service call, starting at the specification of that service. The specification and its generated artifacts may be joined by the generates edge. Either both the specification and its generated artifact are file nodes or both are semantic nodes.

influences

Brief description: A influences B if A directly affects B during the evaluation of a program.
Commonly arises from: assignment
Points from: semantic nodes
Points toward: semantic nodes
Ordinals are used: never
Notes: This is an experimental definition and is expected to undergo refinement.

Assignment causes influence (C++)

void f() {
  //- @x defines/binding VarX
  //- @y defines/binding VarY
  //- @z defines/binding VarZ
  int x = 0, y = 1, z = 2;
  //- VarZ influences VarY
  //- VarY influences VarX
  //- !{VarZ influences VarX}
  //- !{VarY influences VarZ}
  x = y = z;
}

instantiates

Brief description: A instantiates B if A is the result of monomorphizing B.
Commonly arises from: implicit template application
Points from: semantic nodes
Points toward: semantic nodes ([tapp])
Ordinals are used: never
See also: [instantiates/speculative], [specializes]

In C++, specialization and instantiation capture distinct relationships. Every template T has a primary template, which defines the number and kind of template parameters that are written down whenever T is (normally) expressed. Other templates specialize T by specifying alternate bodies for the template depending on the values bound to the template parameters. This specializes relationship is always between a more-specific (or implicit) template and its primary template (applied to one or more arguments). We do not attempt to model a subtyping relationship between template specializations.

When T<...> is written down, an element from the set of T and its specializations must be chosen for manifestation. This element may have free type parameters. These are deduced during the process of instantiating the chosen specialization of T. Some C++ total specializations do not bind any template parameters. Other C++ partial specializations do, and may bind different numbers of type parameters than the primary template. The instantiates relationship records which total or partial specialization was chosen (or if the primary template was chosen), and the template arguments that were matched to that specialization’s parameters. In contrast, the specializes relationship for T<...> records the primary template for T, as well as which template arguments were substituted for the primary template’s parameters.

When the primary template is chosen for the instantiates relationship, the specializes edge points to the same node:

Instantiating the primary template (C++)

//- @t_equals_float defines/binding PrimaryTemplate
template<typename T, typename S> bool t_equals_float = false;

//- @t_equals_float ref TEqualsFloatForLongLong
//- TEqualsFloatForLongLong instantiates TAppLongLong
//- TEqualsFloatForLongLong specializes TAppLongLong
//- TAppLongLong param.0 PrimaryTemplate
bool is_false = t_equals_float<long, long>;

When a specialization of a template is chosen for instantiates, the specializes edge still points to the primary template applied to the correct number of arguments. The instantiates edge points to the specialization that was used. It is applied to the template arguments appropriate for that specialization. Note in the below example how we specialize PrimaryTemplate<float, long> but instantiate SpecificTemplate<long>:

Instantiating a partial specialization (C++)

//- @t_equals_float defines/binding PrimaryTemplate
template<typename T, typename S> bool t_equals_float = false;
//- @int ref IntType @long ref LongType
int i; long l;

//- @t_equals_float defines/binding SpecificTemplate
template <typename S> bool t_equals_float<float, S> = true;
//- @t_equals_float ref TEqualsFloatForFloatLong
//- TEqualsFloatForFloatLong instantiates TAppSpecificFloatLong
//- TAppSpecificFloatLong param.0 SpecificTemplate
//- TAppSpecificFloatLong param.1 LongType
//- TEqualsFloatForLongLong specializes TAppPrimaryFloatLong
//- TAppPrimaryFloatLong param.0 PrimaryTemplate
//- TAppPrimaryFloatLong param.1 FloatType
//- TAppPrimaryFloatLong param.2 LongType
bool is_true = t_equals_float<float, long>;

Here is another similar example:

Instantiation versus specialization. (C++)

//- @v defines/binding PrimaryTemplate
template <typename T, typename S, typename V> T v;
template <typename U>
//- @v defines/binding PartialSpecialization
U v<int, U, long>;
//- @v ref ImplicitSpecialization
float w = v<int, float, long>;
//- ImplicitSpecialization specializes TAppPrimaryTemplate
//- ImplicitSpecialization instantiates TAppPartialSpecialization
//- TAppPrimaryTemplate param.0 PrimaryTemplate
//- TAppPrimaryTemplate param.1 vname("int#builtin",_,_,_,_)
//- TAppPrimaryTemplate param.2 vname("float#builtin",_,_,_,_)
//- TAppPrimaryTemplate param.3 vname("long#builtin",_,_,_,_)
//- TAppPartialSpecialization param.0 PartialSpecialization
//- TAppPartialSpecialization param.1 vname("float#builtin",_,_,_,_)

instantiates/speculative

Brief description: A instantiates/speculative B if A could be the result of monomorphizing B.
Commonly arises from: implicit template application
Points from: semantic nodes
Points toward: semantic nodes ([tapp])
Ordinals are used: never
See also: [instantiates], [specializes]

It may not be possible to decide whether a type instantiation actually occurs, especially when dependent types are involved. The instantiates/speculative and specializes/speculative edges are like the instantiates and specializes edges, but they also record the fact that the instantiation (specialization) did not occur when the code was indexed.

Speculative instantiation and specialization. (C++)

// Checks indexing refs and defs of dependent function specializations.
//- @f defines/binding AbsF1
template <typename S> long f(S s) { return 0; }
//- @f defines/binding AbsF2
template <typename S> int f(S s) { return 0; }
template <typename T> struct S {
  // Note that C++ doesn't even check the kindedness of these type applications.
  friend
  //- @f defines/binding DepSpecFT
  //- DepSpecFT instantiates/speculative TAppAbsF1T
  //- DepSpecFT specializes/speculative TAppAbsF1T
  //- TAppAbsF1T param.0 AbsF1
  //- DepSpecFT instantiates/speculative TAppAbsF2T
  //- DepSpecFT specializes/speculative TAppAbsF2T
  //- TAppAbsF2T param.0 AbsF2
  long f<int, short>(T t);
};

imputes

Brief description: A imputes B if the syntactic span at A is related to the semantic node B through some extralingual process.
Commonly arises from: code generation
Points from: anchors
Points toward: semantic nodes
Ordinals are used: never
See also: [denotes], [generates]

Mechanically, if A defines/binding A' and A imputes B, where A is an anchor and both A' and B are semantic nodes, the imputes edge has the same effect as though the edge A' generates B were produced. This edge is useful in cases where it isn’t otherwise possible to produce the VName for A' and where the span A in source text can uniquely identify A' by the defines/binding edge.

narrows

Brief description: A narrows B if A identifies the same target language object as B but is subject to additional constraints.
Commonly arises from: type refinement
Points from: variables
Points toward: variables
Ordinals are used: never
See also: [generates], [typed]

Some languages allow the types of variables to change over the course of a program. For example, a type test can be used to narrow the type of a variable:

Object x;  // x : Object
if (x instanceof String) {
  x;  // x : String
}

In this example, every x refers to the same variable x, but the x in the body of the conditional has a different static type than both the x in the conditional expression and the x declared at the top. This program is represented in the following way:

//- @x defines/binding XObject
//- XObject typed Object
Object x;
//- @x ref XObject
if (x instanceof String) {
  //- @x ref XString
  //- XString narrows XObject
  //- XString typed String
  x;
}

The x with type String has no defines/binding edge, but is associated with the x with type Object with the narrows edge. Clients should interpret these variables as the same (in the way that nodes associated with the generates edge are the same), but should prefer facts and edges from the particular x being identified. In particular, documents edges and code facts should be chosen from the particular node that is the target of a ref when that node is in a narrows relationship.

named

Brief description: A named B if B is an external identifier for A.
Commonly arises from: definitions and declarations
Points from: semantic nodes
Points toward: name/linkage nodes
Ordinals are used: never
See also: [name]

Classes have JVM binary names. (Java)

package pkg;
//- @E defines/binding EClass
//- EClass named EClassName = vname("pkg.E", "kythe", "", "", "jvm")
//- EClassName.node/kind record
public class E {}

overrides

Brief description: A overrides B if A directly overrides B in an inheritance-based relationship.
Points from: semantic nodes
Points toward: semantic nodes
Ordinals are used: never
See also: [overrides/transitive], [overrides/root]

Methods have overrides edges. (Java)

package pkg;
public class E {
  static class A implements I {
    //- @method defines/binding AMethod
    //- AMethod overrides IMethod
    public void method() {}
  }
  static class B extends A implements I {
    //- @method defines/binding BMethod
    //- BMethod overrides AMethod
    //- BMethod overrides IMethod
    public void method() {}
  }
  static interface I {
    //- @method defines/binding IMethod
    public void method();
  }
}

overrides/root

Brief description: A overrides/root B if following all overrides edges from A would lead to B.
Points from: semantic nodes
Points toward: semantic nodes
Ordinals are used: never
See also: [overrides]

Override roots (C++)

//- @f defines/binding CF
class C { virtual void f()          { } };
//- @f defines/binding DF
class D : C {     void f() override { } };
//- @f defines/binding EF
class E : D {     void f() override { } };

//- !{CF overrides _}
//- !{CF overrides/root _}
//- DF overrides CF
//- DF overrides/root CF
//- EF overrides DF
//- EF overrides/root CF

overrides/transitive

Brief description: A overrides/transitive B if A transitively overrides B, but the relationship A [overrides] B doesn’t exist.
Points from: semantic nodes
Points toward: semantic nodes
Ordinals are used: never
See also: [overrides]

Methods have overrides/transitive edges. (Java)

package pkg;
public class E {
  static class A {
    //- @method defines/binding AMethod
    public void method() {}
  }
  static class B extends A {
    //- @method defines/binding BMethod
    //- !{ BMethod overrides/transitive AMethod }
    public void method() {}
  }
  static class C extends B {
    //- @method defines/binding CMethod
    //- !{ CMethod overrides/transitive BMethod }
    //- CMethod overrides/transitive AMethod
    public void method() {}
  }
}

param

Brief description: A param.N B if B is the Nth parameter of A.
Commonly arises from: ordered lists
Points from: semantic nodes
Points toward: semantic nodes
Ordinals are used: always

Type applications have parameters. (C++)

//- @T defines/binding AliasT
//- AliasT aliases PtrInt
//- PtrInt param.0 PointerConstructor
//- PtrInt param.1 IntType
using T = int*;

property/reads

Brief description: A property/reads B if A is a reader of the property B
Points from: function nodes
Points toward: variable or property nodes
Ordinals are used: never
See also: [property/writes]

property/writes

Brief description: A property/writes B if A is a writer of the property B
Points from: function nodes
Points toward: variable or property nodes
Ordinals are used: never
See also: [property/reads]

ref

Brief description: A ref B if A refers to some previously-defined B.
Commonly arises from: expressions, spelled-out types
Points from: anchors
Points toward: semantic nodes

Mentions of variables are refs. (C++)

//- @x defines/binding VariableX
int x;
//- @y defines/binding VariableY
//- @x ref VariableX
int y = x;

Mentions of variables are refs. (Go)

package p

//- @x defines/binding VarX = vname(_,"kythe",_,"schema","go")
//- VarX.node/kind variable
var x int

//- @x ref VarX
var y = x

ref/implicit

Brief description: A ref/implicit B if A refers to some previously-defined B, and the expression spanned by A is implicit (e.g., the result of a template instantiation).
Commonly arises from: expressions, spelled-out types
Points from: anchors
Points toward: semantic nodes

References inside template instantiations are implicit. (C++)

template <typename T> class C {
  //- @foo ref/implicit SFoo
  int x = T::foo;
};
//- @foo defines/binding SFoo
struct S { static constexpr int foo = 1; };
C<S> cs;

ref/call

Brief description: A ref/call F if A is an anchor that calls F.
Points from: anchors
Points toward: functions
Ordinals are used: never

Anchors inside functions call functions. (C++)

//- @A defines/binding FnA
void A() { }
//- @B defines/binding FnB
//- ACall childof FnB
//- ACall.node/kind anchor
//- ACall ref/call FnA
void B() { A(); }

ref/call/direct

Brief description: A ref/call/direct F if A is an anchor that calls F and ignores indirect dispatch.
Points from: anchors
Points toward: functions
Ordinals are used: never

A direct call to a base struct's function. (C++)

//- @f defines/binding FnF
struct A { virtual void f() {} };
//- @"A::f()" ref/call/direct FnF
struct B : public A { void f() override { A::f(); }};

ref/call/implicit

Brief description: A ref/call/implicit F if A is an anchor that calls F, and the calling expression spanned by A is implicit (e.g., the result of a template instantiation).
Points from: anchors
Points toward: functions
Ordinals are used: never

Calls inside template instantiations are implicit. (C++)

template <typename T> class C {
  //- @"T::foo()" ref/call/implicit SFoo
  int x = T::foo();
};
//- @foo defines/binding SFoo
struct S { static int foo(); };
C<S> cs;

ref/doc

Brief description: A ref/doc C if A is an anchor inside a block of documentation that refers to C.
Points from: anchors
Points toward: semantic nodes
Ordinals are used: never
See also: documents

Anchors in documentation can refer to semantic nodes. (C++)

//- @param_a ref/doc FooParamA
//- FnFoo param.0 FooParamA
//- FnFoo.node/kind function
/// `param_a` is the first parameter.
void foo(int param_a) { }

ref/expands

Brief description: A ref/expands M if A is an anchor that expands macro M.
Points from: anchors
Points toward: macros
Ordinals are used: never
Notes: This edge is used only for first-level macro expansions (where the macro being expanded is spelled out in the source file). Subsequent expansions are recorded using the [ref/expands/transitive] edge.

Uttering the name of a macro expands it. (C++)

//- @FOO defines/binding MacroFoo
#define FOO BAR
//- @FOO ref/expands MacroFoo
int FOO;

ref/expands/transitive

Brief description: A ref/expands/transitive M if A is an anchor that expands macro M', which (after one or more additional expansions) expands macro M.
Points from: anchors
Points toward: macros
Ordinals are used: never
Notes: First-level macro expansions (like those written down in the source file) are recorded with the [ref/expands] edge.

Macros can expand other macros. (C++)

//- @MB defines/binding MacroB
#define MB x
//- @MA defines/binding MacroA
#define MA MB
//- @MA ref/expands/transitive MacroB
//- @MA ref/expands MacroA
//- !{ @MA ref/expands/transitive MacroA }
int MA;

ref/file

Brief description: A ref/file F if A is an anchor referencing a file F. This is distinct from [ref/includes], which indicates that the anchor causes the contents of F to be inserted into the surrounding file. ref/file should be used when the anchor refers explicitly to the identity of F, as in a hardcoded path in a test or build script.
Points from: [anchor]
Points toward: [file]
Ordinals are used: never

ref/imports

Brief description: A ref/imports B if B is imported at the site of A.
Points from: anchors
Points toward: semantic nodes
Ordinals are used: never

Import references a class. (Java)

//- @LinkedList ref/imports LL
import java.util.LinkedList;
public class E {
  //- @LinkedList ref LL
  LinkedList field;
}

ref/id

Brief description: A ref/id B if A does not otherwise ref or define B but the bytes of A’s span are determined by the identifier of B.
Points from: anchors
Points toward: semantic nodes
Ordinals are used: never

A ref/id edge captures a relationship between multiple objects that are grammatically required to share a name, such as a class and its constructors.

Add a ref/id edge at a usage of the governed object (e.g., the constructor) pointing to the governing object (e.g., the class). This captures the fact that the two are linked, but allows a client to distinguish the cross-references of the constructor from those of the class.

Constructors and destructors are named after their classes. (C++)

//- @C defines/binding ClassC
struct C {
  //- @C defines/binding CtorC
  //- @C ref/id ClassC
  C(int i);
};

//- @"C(1)" ref/call CtorC
//- @C ref CtorC
//- @C ref/id ClassC
void f() { auto c = C(1); }

ref/includes

Brief description: A ref/includes F if A is an anchor that inlines the text of file F.
Points from: anchors
Points toward: files
Ordinals are used: never

Includes include files. (C++)

//- @"\"test.h\"" ref/includes HeaderFile
//- HeaderFile.node/kind file
#include "test.h"

#example test.h
// ...

ref/init

Brief description: A ref/init B if A is an anchor attached to an expression that initializes B, typically a field or variable.
Points from: anchors
Points toward: semantic nodes
Ordinals are used: never

Initializer expressions init their fields. (Go)

package p

type S struct {
  //- @F defines/binding Field
  F int
}

// Positional
//- @"17" ref/init Field
var _ = S{17}

// Key-value
//- @F ref/writes Field
//- @"101" ref/init Field
var _ = S{F: 101}

ref/init/implicit

Brief description: A ref/init/implicit B if A is an anchor attached to an expression that initializes B, typically a field or variable, and the expression spanned by A is implicit (e.g., the result of a template instantiation).
Points from: anchors
Points toward: semantic nodes
Ordinals are used: never

Initializers inside template instantiations are implicit. (C++)

template<typename T> class C {
  //- @"1" ref/init/implicit MemberX
  T t = {.x = 1};
};
//- @x defines/binding MemberX
struct S { int x; };
C<S> cs;

ref/queries

Brief description: A ref/queries M if A is an anchor that queries whether macro M is bound.
Points from: anchors
Points toward: macros
Ordinals are used: never

Queries to bound macros are recorded. (C++)

//- @FOO defines/binding MacroFoo
#define FOO BAR
//- @FOO ref/queries MacroFoo
//- MacroFoo.node/kind macro
#if defined(FOO)
#endif
//- !{@BAZ ref/queries _}
#ifdef BAZ
#endif

ref/writes

Brief description: A ref/writes B if A refers to B in an expression that is likely to update the value of B.
Commonly arises from: assignment, accessor functions
Points from: anchors
Points toward: semantic nodes
Ordinals are used: never

Various assignment expressions are writes (C++)

void f() {
  //- @x defines/binding VarX
  int x = 0;
  //- @x ref/writes VarX
  x = 1;
  // Some special forms may be supported.
  //- @#0x ref/writes VarX
  //- @#1x ref VarX
  *(&x) = x;
  // Not all forms are supported.
  //- @x ref VarX
  *(&x + 1) = 1;
}

ref/writes/thunk

Brief description: A ref/writes/thunk B if A refers to B in an expression that is likely to cause a later update the value of B.
Commonly arises from: value uses of setter functions, alias passing
Points from: anchors
Points toward: semantic nodes
Ordinals are used: never
See also: [ref/writes]
Notes: This is an experimental definition and is expected to undergo refinement.

Various assignment expressions are thunkful writes (C++)

void f(int* x_out, int& y_out, const int& z);
void g() {
  //- @x defines/binding VarX
  //- @y defines/binding VarY
  //- @z defines/binding VarZ
  int x, y, z;
  //- @x ref VarX
  //- @y ref VarY
  //- @z ref VarZ
  //- // @x ref/writes/thunk VarX -- currently unsupported
  //- // @y ref/writes/thunk VarY -- currently unsupported
  //- !{ @z ref/writes/thunk VarZ }
  f(&x, y, z);
}

satisfies

Brief description: A satisfies T if A is a type that implicitly satisfies a type T.
Points from: type nodes ([record], [tapp], etc.)
Points toward: type nodes ([interface], [tapp], etc.)
Ordinals are used: never

Concrete types satisfy nearby interfaces. (Go)

package sat

//- @Badger defines/binding Badger
//- Badger.node/kind interface
type Badger interface { HasBadge() bool }

//- @SB defines/binding StaticBadger
//- StaticBadger satisfies Badger
type SB bool

//- @HasBadge defines/binding HasBadge
//- HasBadge childof StaticBadger
func (s SB) HasBadge() bool { return bool(s) }

Types of overriding methods satisfy types of overridden interface methods. (Go)

package sat

//- @HasBadge defines/binding HasBadgeI
//- HasBadgeI typed HasBadgeIType
type Badger interface { HasBadge() bool }

type SB bool

//- @HasBadge defines/binding HasBadge
//- HasBadge typed HasBadgeType
//- HasBadgeType satisfies HasBadgeIType
//- ! { HasBadge typed HasBadgeIType }
func (s SB) HasBadge() bool { return bool(s) }

specializes

Brief description: A specializes B if A provides a declaration of a type specialization B.
Commonly arises from: template total and partial specialization
Points from: semantic nodes
Points toward: semantic nodes ([tapp])
Ordinals are used: never
See also: [instantiates], [instantiates/speculative]

Template specializations specialize. (C++)

//- @C defines/binding TemplateClassC
template <typename T> class C { };
//- @C defines/binding SpecializedClassC
template <> class C<int> { };
//- SpecializedClassC specializes TAppCInt
//- TAppCInt.node/kind tapp
//- TAppCInt param.0 TemplateClassC

Function templates specialize. (C++)

//- @id defines/binding IdFn
template <typename T> T id(T x) { return x; }
//- @id defines/binding IdSpecFn
template <> bool id(bool x) { return !(!x); }
//- IdSpecFn specializes TAppIdFnBool
//- TAppIdFnBool.node/kind tapp
//- TAppIdFnBool param.0 IdFn
//- TAppIdFnBool param.1 vname("bool#builtin",_,_,_,_)

specializes/speculative

See [instantiates/speculative].

tagged

Brief description: A tagged B if B labels A with a [diagnostic] message.
Commonly arises from: build/analysis errors
Points from: [anchor], [file]
Points toward: [diagnostic]
Ordinals are used: never

tparam

Brief description: A tparam.N B if B is the Nth type/template parameter of A.
Commonly arises from: templates, generic types
Points from: semantic nodes
Points toward: [tvar]
Ordinals are used: always

Generics have ordered tparam edges. (Go)

package tparam

//- Func.node/kind function
//- TVar.node/kind tvar
//- UVar.node/kind tvar

//- Func tparam.0 TVar
//- Func tparam.1 UVar

//- @Map defines/binding Func
//- @#0T defines/binding TVar
//- @#0U defines/binding UVar
func Map[T any, U any](l []T, f func(T) U) []U {
        res := make([]U, len(l))
        for i, t := range l {
                res[i] = f(t)
        }
        return res
}

typed

Brief description: A is typed B if A has the type B.
Commonly arises from: terms with types; definitions and declarations
Points from: semantic nodes
Points toward: types
Ordinals are used: never

Enumerations can be ascribed types. (C++)

//- @E defines/binding EnumE
//- EnumE typed IntType
enum E : int;

Java methods are type applications of the builtin fn type. (Java)

//- @E defines/binding E
public class E {
  //- @func defines/binding Func
  //- Func typed FuncType
  //- FuncType.node/kind tapp
  //- FuncType param.0 FnBuiltin=vname("fn#builtin","","","","java")
  //- FuncType param.1 IntBuiltin=vname("int#builtin","","","","java")
  //- FuncType param.2 E
  //- FuncType param.3 String
  //- @String ref String
  int func(String p) { return 0; }
}

undefines

Brief description: A undefines M if A detaches M from M’s binding.
Commonly arises from: macro undefinition
Points from: anchors
Points toward: macros

Undef undefines macros. (C++)

//- @FOO defines/binding MacroFoo
#define FOO BAR
//- @FOO undefines MacroFoo
#undef FOO
//- @FOO defines/binding DifferentMacroFoo
#define FOO BAZ

Common node facts

Some facts can be attached to many different kinds of nodes. A subset of these (called tags) are interesting even if they have no associated values.

node/kind

Brief description: A node’s node/kind is a label describing the role of the node in the graph. The kind is the one fact every node must have in order to participate in the rest of the schema.
Attached to: all nodes

code

Brief description

A node’s code is a serialized MarkedSource message that can be used to describe that node.

For indexers implemented through the proxy API, *code/json* is an extra
supported fact.  Its value will be interpreted as a JSON-encoded MarkedSource
message and will be rewritted to the equivalent wire-encoded *code* fact.

Attached to

semantic nodes

doc/uri

Brief description: If this node’s primary documentation exists outside the graph, this fact can hold a URI pointing to that documentation. For example, you may want to link builtin functions to their definitions in a language’s reference manual.
Attached to: semantic nodes

semantic/generated

Brief description: A node’s semantic/generated fact identifies the effect it has on nodes that generate it. Values may include set, indicating that this node changes its generator; or alias, indicating that it takes an alias to its generator.
Attached to: semantic nodes
See also: [generates]

When a target language’s indexer is processing code with attached metadata (e.g., code that was generated by another tool), it may come across definitions for which it should emit a P generates D edge. If the metadata indicates a nontrival semantic, the indexer should attach a semantic/generated fact to D with the relevant string value.

tag/deprecated

Brief description: If this node should no longer be used, it should be marked with tag/deprecated. If this fact has a nonempty value, that value should be set to a UTF8-encoded human-readable reason why the node was deprecated, and/or what should be done in the future.
Attached to: semantic nodes

tag/static

Brief description: In languages which make a distinction between class-bound and instance-bound members, members which are bound to the class should be marked with tag/static.
Attached to: semantic nodes

Static members are tagged. (Java)

class Wrapper {
  //- @A defines/binding StaticField
  //- StaticField.tag/static _
  public static int A = 1;
  //- @field defines/binding InstanceField
  //- !{ InstanceField.tag/static _ }
  public int field = 2;
}

tag/abstract

Brief description: In languages which have the concept of a non-instantiable class or method which must be defined by subclasses, such classes and methods should be marked with tag/abstract.
Attached to: semantic nodes

Node kinds

anchor

Brief description

An anchor connects concrete syntax to abstract syntax. An anchor is within the [file] with the same Path, Root, Corpus ([file]s do not have a Language, so it is not used for matching).

Naming convention

Path: The Path of the [file] containing this anchor (or empty if there is no such file).
Root: The Root of the [file] containing this anchor.
Corpus: The Corpus of the [file] containing this anchor.
Language: The Language of the compilation producing this anchor.

Expected out-edges

[defines], [defines/binding], [ref], [ref/call]

Facts

loc/start: The starting byte offset (from 0) in the [file] containing this anchor.
loc/end: The ending byte offset (exclusive) in the [file] containing this anchor. If the loc/start and loc/end of an anchor are both 0, this anchor refers to the whole file (for example, if a file is the defines/binding site for a package or module).
snippet/start: The starting byte offset (from 0) of the snippet for this anchor (optional).
snippet/end: The ending byte offset (from 0) of the snippet for this anchor (optional).
build/config: A short name describing the build configuration or platform this anchor targets (optional).

subkind

If set to implicit, this anchor should not also have loc/start or loc/end facts. It is an artifact of some internal process that may still have important semantic effects.

See also

[file]

Anchor VNames are specified such that one may determine the VName of the [file] containing an anchor by dropping the anchor VName’s Language and Signature fields.

Anchors have byte offsets. (C++)

int 錨;
//- VarNameAnchor.loc/start 4
//- VarNameAnchor.loc/end 7
// Note that the glyph 錨 is encoded in UTF-8 as [e9 8c a8].

Anchors have VName rules. (C++)

//- @foo=vname(_,Corpus,Root,Path,"c++").node/kind anchor
//- File=vname("",Corpus,Root,Path,"").node/kind file
int foo;

Anchors can overlap. (Java)

import java.util.Optional;
public class E {
  //- @"Optional<String>" ref TSpecClass
  //- @Optional ref OptClass
  //- @String ref StrClass
  Optional<String> f;
}

Implicit anchors arise from default constructors. (C++)

//- @C defines/binding ClassC
//- CCtor childof ClassC
//- CCtor.subkind constructor
//- CCtor.complete definition
class C { };
//- @D defines/binding ClassD
//- DCtor childof ClassD
//- DCtor.subkind constructor
//- DCtor.complete definition
class D { C c; };
D d;
//- ImplicitCallToCCtor.node/kind anchor
//- ImplicitCallToCCtor.subkind implicit
//- ImplicitCallToCCtor ref/call/implicit CCtor
//- ImplicitCallToCCtor childof DCtor

Anchors have VName rules. (Java)

public class E {
  //- @foo=vname(_,Corpus,Root,Path,"java").node/kind anchor
  int foo;
}
//- File=vname("",Corpus,Root,Path,"").node/kind file

Anchors have VName rules. (Go)

//- @anchor=vname(_,Corpus,Root,Path,"go").node/kind anchor
//- File=vname("",Corpus,Root,Path,"").node/kind file
package anchor

constant

Brief description

A constant is a value that can be statically determined.

Facts

text: A string representation of the constant.

See also

[sum]

Enumerators are constants. (C++)

enum E {
//- @EM defines/binding Enumerator
  EM = 42
};
//- Enumerator.node/kind constant
//- Enumerator.text 42

Enumeration values are constants. (Java)

public enum E {
  //- @A defines/binding A
  //- A.node/kind constant
  A;
}

diagnostic

Brief description

A diagnostic is a node with a message concerning some aspect of the related [file] or [anchor]. These can often result from errors while building or analyzing a compilation and allow an analyzer to surface errors to end-users. Each diagnostic is linked to the related [file] or [anchor] using a [tagged] edge (where the diagnostic node is the target of the edge). In principle, a diagnostic could also be related to a semantic node; but that a user-facing UI might not be able to display anything meaningful without a concrete anchor.

Facts

message: A relatively short, one-line, human-readable string explaining the diagnostic. The encoding must be UTF-8.
details: Longer form of the message that exposes more detail concerning the diagnostic. This could be very specialized to the particular diagnostic, possibly even containing stack traces or build system logs (optional).
context/url: URL leading to more detailed information concerning this diagnostic or a related group of diagnostics (optional).

doc

Brief description

A doc is text that documents a node.

Facts

text: The text of the document. The encoding must be UTF-8.

Notes

Embedded references inside a doc node’s text are delimited using [ and ]. Ordinary brackets are escaped as \[ and \[; backslash is escaped as \\. Targets of embedded references are stored as param edges on the document, where the nth opening bracket is matched with the nth param. Indexers should strip off comment delimiters.

See also

[documents]

In the following example, the \n in the assertion about DocNode.text is stored as a newline in the graph. The escape is there for the verifier.

Doc nodes contain documentation text. (C++)

/// A function.
/// It sums its parameters `x` and `y`.
int f(int x, int y) {
  return x + y;
}
//- DocNode documents FnF
//- DocNode.node/kind doc
//- DocNode.text " A function.\n It sums its parameters `[x]` and `[y]`."
//- DocNode param.0 VarX
//- DocNode param.1 VarY
//- FnF param.0 VarX
//- FnF param.1 VarY

file

Brief description

A file is an array of bytes with a significant external name.

Naming convention

Language: empty
Path: External path to this file (or some other unique ID if this file is virtual)
Signature: empty

Facts

language: The source language, as used in VName language. This fact is optional; it is intended for use by source browsers, for things like source colorization.
text: Uninterpreted content as an array of bytes.
text/encoding: Encoding of the text fact. See http://www.w3.org/TR/encoding/#names-and-labels for standard values. If empty, "UTF-8" is assumed.

See also

[anchor], [ref/includes]

int x;
//- XAnchor=vname(_,Corpus,Root,Path,"c++").node/kind anchor
//- XAnchor.loc/start 4
//- XAnchor.loc/end 5
//- SourceFile=vname("",Corpus,Root,Path,"").node/kind file

flag

Brief description: A flag is a named parameter usually passed to a program from the command line.
Notes: Indexers can support various common flag libraries. Flag nodes from different libraries are given different kind labels. For example, the Abseil/Google flag libraries will produce flag/google nodes. The name and default value of a flag can be derived from its [code] fact.

interface

Brief description: An interface defines an implementable type.

Interfaces are interfaces. (Java)

public class E {
  //- @I defines/binding Interface
  //- Interface.node/kind interface
  public static interface I {}
}

function

Brief description

A function binds zero or more parameters and returns a result.

Facts

complete: incomplete if this is only a declaration; definition if it is a definition.
subkind: constructor for constructors; destructor for destructors; none or unspecified for normal or member functions.

Functions are functions. (C++)

//- @F defines/binding FnF
//- FnF.node/kind function
//- FnF.complete incomplete
//- @X defines/binding VarX
//- VarX.complete incomplete
//- FnF param.0 VarX
void F(int X);

lookup

Brief description

A lookup is a structured name whose resolution cannot be completed without additional context.

Facts

text: The deferred name to be resolved.

Notes

Name resolution can be a complicated problem. In C++ templates, the meaning of a dependent name cannot be determined until the template parameters it depends upon are supplied. Similarly, in dynamic languages like Python, name resolution may depend on the runtime context. Nevertheless, when we are unable to come up with a semantic representation of one or more nodes in a path-structured name, we record this name as a collection of lookup nodes. Each lookup node has some text (the dynamic lookup done at that node) as well as some params (to record the semantic object into which text is being used as a key).

Dependent names are lookups. (C++)

template
//- @T defines/binding DepT
<template <typename> class T>
struct C {
//- @D ref DepTIntD
using S = typename T<int>::D;
};
//- DepTIntD.text D
//- DepTIntD.node/kind lookup
//- DepTIntD param.0 DepTInt
//- DepTInt.node/kind tapp
//- DepTInt param.0 DepT
//- DepTInt param.1 Int

Lookups record paths. (C++)

template
<template <typename> class T>
struct C {
//- @F ref DepTIntDEF
//- DepTIntDEF.text F
//- @E ref DepTIntDE
//- DepTIntDE.text E
//- @D ref DepTIntD
//- DepTIntD.text D
using S = typename T<int>::D::E::F;
};
//- DepTIntDEF param.0 DepTIntDE
//- DepTIntDE param.0 DepTIntD

macro

Brief description: A macro is a metaprogram that operates on source text.
Notes: Macros are distinct from abs because they do not participate in the programming language proper. Instead, they are evaluated separately, usually before semantic analysis takes place.
See also: [ref/expands], [ref/expands/transitive], [ref/queries], [undefines]

Defines define macros. (C++)

//- @FOO defines/binding MacroFoo
//- MacroFoo.node/kind macro
#define FOO BAR

name

Brief description

A name specifies an external identifier for a node, typically used for linking.

Naming convention

Signature: The name string.
Language: The namespace to which the name belongs.
Path: empty
Root: empty
Corpus: empty

Notes

The namespace is some domain in which the names are expected to be unique at linkage time and/or runtime, such as the Itanium C++ ABI.

package

Brief description: A package defines a module containing declarations.
Notes: Languages in which a module is implicitly defined based on the file name should emit a defines/implicit edge from a zero-width anchor at offset 0 in that file to the corresponding package node.

Top-level declarations are children of package nodes. (Java)

//- @pkg ref Pkg
//- Pkg.node/kind package
package pkg;
//- @E defines/binding ClassE
//- ClassE childof Pkg
public class E {}

Files belonging to a package are children of that package. (Go)

//- @foo defines/binding Pkg
//- Pkg.node/kind package
package foo

//- File = vname("", _, _, "schema/example.go", "").node/kind file
//- File childof Pkg

process

Brief description

A process describes an abstract processing action in a workflow.

Facts

label: A string label used to identify the process (optional).

See also

[depends]

A process defines a processing action such as a step in a build or the execution of a continuous integration workflow. For workflows that assign identifying labels to processing steps (such as target names), the label should carry the name so assigned.

record

Brief description

A record defines a type composed of a collection of elements.

Facts

subkind: Language-specific subkind for this record.
complete: incomplete if this is only a declaration; definition if it is a definition.

Notes

This node is a nominal record such that two records with the same children but different names should always be considered to be distinct.

Classes are records. (C++)

//- @C defines/binding ClassCDecl
//- ClassCDecl.node/kind record
//- ClassCDecl.complete incomplete
class C;

//- @C defines/binding ClassCDefn
//- ClassCDefn.node/kind record
//- ClassCDefn.complete definition
class C { };

Classes are records. (Java)

package pkg;
//- @E defines/binding ClassE
//- ClassE.node/kind record
//- ClassE.subkind class
public class E {
}

sum

Brief description

A sum defines a type whose instances must choose one out of a set of possible representations.

Facts

subkind

Language-specific subkind for this record.

complete

incomplete if this is only a declaration.
complete if this is a declaration that is considered usable by value.
definition if this provides a full description of the type.

Enums are sums. (C++)

//- @CE defines/binding EnumCE
//- EnumCE.node/kind sum
//- EnumCE.complete definition
enum CE { };

//- @E defines/binding EnumE
//- EnumE.node/kind sum
//- EnumE.complete incomplete
enum class E;

//- @E defines/binding EnumETyped
//- EnumETyped.node/kind sum
//- EnumETyped.complete complete
enum class E : int;

//- @E defines/binding EnumEDefn
//- EnumEDefn.node/kind sum
//- EnumEDefn.complete definition
enum class E : int { };

Enums are sum/enumClasses. (Java)

//- @E defines/binding EnumE
//- E.node/kind sum
//- E.subkind enumClass
public enum E {}

symbol

Brief description

A symbol is a common name used by tools to refer to a set of objects. The spelling of a symbol is defined per language, and should be constructible by tools that do not necessarily have direct access to the compiler. The rules for binding a symbol to a particular object from this set may depend on external configuration (such as the list of libraries being linked together to produce an executable).

Naming convention

Path: empty
Root: empty
Corpus: empty

talias

Brief description: A talias gives a new name to an existing type.
Expected out-edges: [aliases]
Notes: A talias may be virtually removed from the graph. Some languages may have additional reduction rules.

Type aliases are taliases. (C++)

//- @Counter defines/binding TAlias
//- TAlias.node/kind talias
using Counter = int;

tapp

Brief description: A tapp applies a type constructor or [abs] to zero or more parameters.
Expected out-edges: [param] (at least ordinal 0)

Pointers are type constructors. (C++)

//- @PtrInt defines/binding PtrIntAlias
//- PtrIntAlias aliases IntPtrType
using PtrInt = int*;
//- IntPtrType.node/kind tapp
//- IntPtrType param.0 vname("ptr#builtin",_,_,_,"c++")
//- IntPtrType param.1 vname("int#builtin",_,_,_,"c++")

Generic classes are type constructors. (Java)

import java.util.Optional;
public class E {
  //- @f defines/binding Field
  //- Field typed TSpecClass
  //- TSpecClass.node/kind tapp
  //- TSpecClass param.0 OptClass
  //- TSpecClass param.1 StrClass
  Optional<String> f;
}

tbuiltin

Brief description

A tbuiltin is a type that is supplied by the language itself.

Naming convention

Signature: language-specific-string#builtin

Notes

See the Language-specific rules section below for enumerations of these builtin types.

Int is a builtin. (C++)

//- @int ref TInt
//- TInt.node/kind tbuiltin
using Int = int;

tnominal

Brief description: A tnominal is a type that may be purely identified by its name.
Notes: When a tnominal's definition is known, some language-specific rules dictate that the definition node be used instead of a tnominal in the type graph.

Forward-declared classes are tnominals. (C++)

//- @C defines/binding ClassC
//- ClassC.node/kind record
class C;
//- @Alias defines/binding Alias
//- Alias aliases PtrC
//- PtrC param.1 NominalC
//- NominalC.node/kind tnominal
using Alias = C*;

tsigma

Brief description: A tsigma is an ordered list of types that is unpacked on substitution.
Expected out-edges: [param] (at least ordinal 0)

Parameter packs unpack tsigmas. (C++)

template <typename... Ts>
//- @f defines/binding FnTF
void f(Ts... ts) { }

//- @int ref IntType
//- @double ref DoubleType
//- @f ref AppFnTFSigma
int g(double x) { f(1, x); }

//- FnF instantiates AppFnTFSigma
//- FnF.node/kind function
//- AppFnTFSigma param.0 FnTF
//- AppFnTFSigma param.1 Sigma
//- Sigma.node/kind tsigma
//- Sigma param.0 IntType
//- Sigma param.1 DoubleType

tvar

Brief description: A tvar is a type/template parameter and bound to a semantic node with a [tparam] edge.
See also: [tapp], [tparam], [specializes]

Type parameters are tvars. (Go)

package tvar

//- Container.node/kind record
//- TVar.node/kind tvar

//- Container tparam.0 TVar

//- @Container defines/binding Container
//- @T defines/binding TVar
type Container[T any] struct {
  //- @T ref TVar
  Element T
}

Type variables are tvars. (C++)

//- @C defines/binding TemplateC
//- @T defines/binding TVarT
//- TVarT.node/kind tvar
//- TemplateC tparam.0 TVarT
template <typename T> class C {
//- @T ref TVarT
  using S = T;
};

variable

Brief description

A variable is a location for storing data.

Facts

complete

incomplete if this is only a declaration.
definition if this is a variable definition.

Subkinds

local for variables in function scope.
local/parameter for variables passed into functions.
field for variables that are data members of some record.
import for variables that reference objects in other modules.

Variables are variables. (C++)

//- @x defines/binding VariableX
//- VariableX.node/kind variable
int x;

Fields are variables. (Java)

import java.util.Optional;
public class E {
  //- @f defines/binding Field
  //- Field.node/kind variable
  //- Field.subkind field
  Optional<String> f;
}

Parameters are variables. (Java)

public class E {
  //- @arg defines/binding Param
  //- Param.node/kind variable
  //- Param.subkind local/parameter
  void f(String arg) {}
}

Locals are variables. (Java)

public class E {
  void f() {
    //- @var defines/binding Local
    //- Local.node/kind variable
    //- Local.subkind local
    String var;
  }
}

vcs

Brief description

A vcs is a reference to a particular revision stored in a version control system.

Facts

vcs/id

A stable identifier for a revision in the repository. For example, a Git repository uses commit hashes as identifiers.

vcs/type

darcs: this is a Darcs repository.
git: this is a Git repository.
hg: this is a Mercurial repository.
perforce: this is a Perforce repository.
svn: this is a Subversion repository.

vcs/uri

A URI that points to the repository root. Acceptable values for this fact depend on the vcs/type.

Naming convention

When naming a vcs node, it is a good idea to use only the corpus field of a VName. You can then use that corpus value in the VNames of all nodes that are generated from that revision.

Notes

It is important that the vcs uses a stable reference to a revision. For example, using the name of a Git branch would not be a good idea, since Git branches point to different commits over time. It is better to use the (full) hash of the commit.

Variance

Some languages (like Objective C) allow you to specify the variance of a type argument as it relates to the typing relationship of the class it is a parameter for. This is different than the bounds that may be placed on a type variable. The bounds are represented with [bounded] edges. Variance is stored as a fact in the node for the type variable.

For example, @interface G1<__covariant Type : P1*> : Root states that G1 is a generic type, G1 takes a single type parameter that has an upper bound of P1*, and G1<T> is a subtype of G1<U> if and only if T is a subtype of U.

Specifically for Objective C, the default variance is invariant, so @interface G1<Type : P1*> : Root states that G1<T> is a subtype of G2<U> if and only if T == U.

The variance fact can be omitted, in which case covariance is assumed.

Variance for a generic type (ObjC)

@interface Root
@end

@interface P1 : Root
@end

@interface P2 : P1
@end

//- @Type defines/binding TypeVar1
//- @G1 defines/binding G1Abs
//- TypeVar1.node/kind tvar
//- TypeVar1.variance covariant
//- G1Abs tparam.0 TypeVar1
@interface G1<__covariant Type> : Root
@end

//- @Type defines/binding TypeVar2
//- @G2 defines/binding G2Abs
//- TypeVar2.node/kind tvar
//- TypeVar2.variance contravariant
//- G2Abs tparam.0 TypeVar2
@interface G2<__contravariant Type> : Root
@end

int main(int argc, char **argv) {
  // Example of variance in action.
  G1<P2*> *g1var = [[G1 alloc] init];
  G1<P1*> *g1var2 = g1var;

  G2<P1*> *g2var = [[G1 alloc] init];
  G2<P2*> *g2var2 = g2var;

  return 0;
}

Language-specific rules

C++

C++'s source language is spelled "c++".

Builtin types

C++ supplies the following [tbuiltin] nodes by default:

Builtin type nodes (C++)

//- @"void" ref vname("void#builtin","","","","c++")
using Void = void;

//- @PtrVoid defines/binding AliasTappPtrVoid
//- AliasTappPtrVoid aliases TappPtrVoid
//- TappPtrVoid param.0 vname("ptr#builtin","","","","c++")
using PtrVoid = void*;

//- @"int" ref vname("int#builtin","","","","c++")
using Int = int;

//- @ConstVoid defines/binding TappConstVoidAlias
//- TappConstVoidAlias aliases TAppConstVoid
//- TAppConstVoid param.0 vname("const#builtin","","","","c++")
using ConstVoid = const void;

//- @VolatileVoid defines/binding TappVolatileVoidAlias
//- TappVolatileVoidAlias aliases TAppVolatileVoid
//- TAppVolatileVoid param.0 vname("volatile#builtin","","","","c++")
using VolatileVoid = volatile void;

///- @RestrictPtrVoid defines/binding TappRestrictPtrVoidAlias
///- TappRestrictPtrVoidAlias aliases TAppRestrictPtrVoid
///- TAppRestrictPtrVoid param.0 vname("restrict#builtin","","","","c++")
using RestrictPtrVoid = void * __restrict__;

Record and sum subkinds

C++ defines the following subkinds for [record] nodes:

Record subkinds (C++)

//- @C defines/binding ClassC
//- C.subkind class
class C;

//- @S defines/binding StructS
//- S.subkind struct
struct S;

//- @U defines/binding UnionU
//- U.subkind union
union U;

C++ defines the following subkinds for [sum] nodes:

Sum subkinds (C++)

//- @E defines/binding EnumE
//- E.subkind enum
enum E { };

//- @EC defines/binding EnumClassEC
//- EnumClassEC.subkind enumClass
enum class EC;

References to definitions and declarations of types

If the indexer has available a definition of a C++ node, edges should be drawn directly to that node:

Refer to definitions directly. (C++)

//- @C defines/binding ClassCDefn
class C { };
//- @Alias defines/binding CAlias
//- CAlias aliases ClassCDefn
using Alias = C;

If the indexer only has a complete C++ node, or if the node is incomplete, edges should be drawn to a [tnominal] node:

Refer to complete or incomplete declarations indirectly. (C++)

//- @E defines/binding CompleteEnumE
enum class E : int;
//- @Alias defines/binding EAlias
//- EAlias aliases EnumETNominal
//- EnumETNominal.node/kind tnominal
using Alias = E;

When generating the name of a C++ type that requires looking down some edge, the following should be kept in mind. If there are multiple possible nodes connected by edge, consistently prefer one that has a complete fact set to definition; failing that, prefer one that has a complete fact set to complete; failing that, consistently prefer an arbitrary node from the edge-connected set (see [record], [sum]).

Qualifiers on types

The const, restrict, and volatile qualifiers may be applied to types. These are represented as type constructors. The indexer always applies them in the same order (const innermost, then restrict, then volatile) and collapses redundant qualifiers should they arise (const const becomes const). Tools should optimally canonicalize types according to these rules (for instance, after removing a [talias] node).

Qualifiers have canonical order. (C++)

//- @U defines/binding VRCAlias
//- VRCAlias aliases VRCInt
using U = int * __restrict__ const volatile;
//- @V defines/binding AnotherAlias
//- AnotherAlias aliases VRCInt
using V = int * volatile __restrict__ const;

Redundant CVR-qualifiers are dropped. (C++)

#arguments -Wno-duplicate-decl-specifier
//- @U defines/binding CIAlias
//- CIAlias aliases CIType
using U = const const int;
//- @V defines/binding AnotherCIAlias
//- AnotherCIAlias aliases CIType
using V = const int;

Function types

The fn#builtin type constructor is used to represent function types. Its first parameter is the return type; its second parameter is the receiver type; subsequent parameters are arguments. Functions without an explicit return type will return a language-specific "void" type. Functions without a receiver type will use a language-specific "empty" receiver type.

C++ function types use a builtin type constructor. (C++)

//- @U defines/binding UAlias
//- UAlias aliases TAppFn
//- TAppFn param.0 vname("fn#builtin",_,_,_,_)
//- TAppFn param.1 vname("int#builtin",_,_,_,_)
//- TAppFn param.2 vname("short#builtin",_,_,_,_)
//- TAppFn param.3 vname("float#builtin",_,_,_,_)
using U = int(short, float);
// TODO(#3613): add receiver type to C++ function types

For K&R-style prototypes in C, the indexer will use the knrfn#builtin type.

Function types use a builtin type constructor. (Go)

package foo

//- @fn defines/binding Func
//- Func typed FuncType
//- FuncType.node/kind tapp
//- FuncType param.0 FnBuiltin=vname("fn#builtin",_,_,_,_)
//- FnBuiltin.node/kind tbuiltin
func fn() {}

Go void functions return the empty tuple type. (Go)

package foo

//- @fn defines/binding Func
//- Func typed FuncType
//- FuncType param.1 EmptyTuple
//- EmptyTuple.node/kind tapp
//- EmptyTuple param.0 TupleBuiltin=vname("tuple#builtin",_,_,_,_)
//- TupleBuiltin.node/kind tbuiltin
//- ! { EmptyTuple param.1 _ }
func fn() {}

Go functions have an empty tuple type receiver. (Go)

package foo

//- @fn defines/binding Func
//- Func typed FuncType
//- FuncType param.2 EmptyTuple
//- EmptyTuple.node/kind tapp
//- EmptyTuple param.0 TupleBuiltin=vname("tuple#builtin",_,_,_,_)
//- TupleBuiltin.node/kind tbuiltin
//- ! { EmptyTuple param.1 _ }
func fn() {}

Go methods have a non-empty receiver type. (Go)

package foo

//- @S defines/binding S
type S struct {}

//- @Method defines/binding Method
//- Method typed MethodType
//- MethodType param.2 S
func (S) Method() {}

//- @PMethod defines/binding PMethod
//- PMethod typed PMethodType
//- PMethodType param.2 SPointer
//- SPointer.node/kind tapp
//- SPointer param.0 vname("pointer#builtin",_,_,_,_)
//- SPointer param.1 S
func (*S) PMethod() {}

Java constructors have their parent class as a return/receiver type. (Java)

//- @E defines/binding E
public class E {
  //- @E defines/binding ECtor
  //- ECtor typed FnType
  //- FnType.node/kind tapp
  //- FnType param.0 vname("fn#builtin",_,_,_,_)
  //- FnType param.1 E
  //- FnType param.2 E
  public E() {}
}

Java static methods have a void receiver type. (Java)

public class E {
  //- @f defines/binding F
  //- F typed FnType
  //- FnType.node/kind tapp
  //- FnType param.0 vname("fn#builtin",_,_,_,_)
  //- FnType param.1 vname("int#builtin",_,_,_,_)
  //- FnType param.2 vname("void#builtin",_,_,_,_)
  public static int f() { return 0; }
}

Structural hashes

[record] and [sum] definitions are given vnames with signatures composed of their lexical names and their structural hash, which unifies equivalent definitions that appear across distinct and unrelated translation units.

Template template parameters

Template template parameters are represented from the outside in. In this example, the top-level template C’s first [tparam] is the template template parameter B. This is stored as a [tvar]. Then the ordinary template parameter A is B’s first [tparam].

We do not represent higher kinds (C++)

//- @A defines/binding TvarA
//- @B defines/binding TvarB
//- @C defines/binding TemplateC
template <template <typename A> class B> class C;
//- TemplateC tparam.0 TvarB
//- TvarB tparam.0 TvarA

Special values for dependent lookups

Sometimes, the indexer must synthesize a [lookup] node to a constructor or destructor without knowing the name of the type being constructed or destroyed. In this case, the constructor (or destructor) is named #ctor (or #dtor):

Dependent ctors and dtors (C++)

//- @T defines/binding TyvarT
template <typename T>
class C : T {
  //- @"T()" ref/call LookupTCtor
  //- LookupTCtor.node/kind lookup
  //- LookupTCtor param.0 TyvarT
  //- LookupTCtor.text "#ctor"
  C() : T() { }

  T *t;
  //- @"delete t" ref/call LookupTDtor
  //- LookupTDtor.node/kind lookup
  //- LookupTDtor param.0 TyvarT
  //- LookupTDtor.text "#dtor"
  void f() { delete t; }
};

Go

The source language for Go is spelled "go".

Type Definitions

A Go type definition like type Foo Bar creates a new named type Foo with the same structure as Bar but with a distinct method set. In the Kythe schema we model Foo as a [record] node. If the underlying type is not already a struct this node is given the subkind type.

Type definitions (Go)

package tdef

//- @Foo defines/binding Foo
//- Foo.node/kind record
//- Foo.subkind type
type Foo int

type bar struct { z int }

//- @Bar defines/binding Bar
//- Bar.node/kind record
//- Bar.subkind struct
type Bar bar

//- @Pbar defines/binding Pbar
//- Pbar.node/kind record
//- Pbar.subkind type
type Pbar []bar

Java

Java’s source language is spelled "java".

Builtin types

Java supplies the following [tbuiltin] nodes by default:

Builtin type nodes (Java)

public class E {
  //- @f defines/binding F
  //- F typed FnType
  //- FnType.node/kind tapp
  //- FnType param.0 FnBuiltin = vname("fn#builtin","","","","java")
  //- FnType param.1 VoidBuiltin = vname("void#builtin","","","","java")
  public static void f(
    //- FnType param.3 BooleanBuiltin = vname("boolean#builtin","","","","java")
    boolean bool,
    //- FnType param.4 ByteBuiltin = vname("byte#builtin","","","","java")
    byte b,
    //- FnType param.5 ShortBuiltin = vname("short#builtin","","","","java")
    short s,
    //- FnType param.6 IntBuiltin = vname("int#builtin","","","","java")
    int i,
    //- FnType param.7 LongBuiltin = vname("long#builtin","","","","java")
    long l,
    //- FnType param.8 CharBuiltin = vname("char#builtin","","","","java")
    char c,
    //- FnType param.9 FloatBuiltin = vname("float#builtin","","","","java")
    float f,
    //- FnType param.10 DoubleBuiltin = vname("double#builtin","","","","java")
    double d,
    //- FnType param.11 StrArray
    //- StrArray.node/kind tapp
    //- StrArray param.0 ArrayBuiltin = vname("array#builtin","","","","java")
    //- StrArray param.1 String
    String[] arry) {}
}

Node Subkinds

Classes and Enums

In Java, classes are nodes with a subkind of class. Likewise, enum classes are nodes with a subkind of enumClass.

Classes and enums (Java)

//- @E defines/binding EClass
//- EClass.node/kind record
//- EClass.subkind class
public class E {

  //- @Enum defines/binding Enum
  //- Enum.node/kind sum
  //- Enum.subkind enumClass
  static enum Enum {}
}

Functions

All methods are nodes, including class constructors. To differentiate between constructors and other methods, nodes for constructors have the subkind constructor.

Methods and constructors (Java)

public class E {

  //- @E defines/binding ECtor
  //- ECtor.node/kind function
  //- ECtor.subkind constructor
  public E() {}

  //- @staticMethod defines/binding StaticMethod
  //- StaticMethod.node/kind function
  public static void staticMethod() {}

  //- @instanceMethod defines/binding InstanceMethod
  //- InstanceMethod.node/kind function
  public void instanceMethod() {}
}

Variables

Java has 5 types of nodes, each with a distinct subkind:

Fields: field subkind
Locals: local subkind
Exception Variables (see catch blocks): local/exception subkind
Parameters: local/parameter subkind
Resource Variables (see the try-with-resources statement): local/resource subkind

Variables (Java)

import java.io.IOException;
import java.io.OutputStream;

public class E {

  //- @field defines/binding Field
  //- Field.node/kind variable
  //- Field.subkind field
  private final Object field = null;

  //- @param defines/binding Parameter
  //- Parameter.node/kind variable
  //- Parameter.subkind local/parameter
  public static void m(String param) throws IOException {

    //- @local defines/binding Local
    //- Local.node/kind variable
    //- Local.subkind local
    int local = 42;

    //- @resource defines/binding ResourceVar
    //- ResourceVar.node/kind variable
    //- ResourceVar.subkind local/resource
    try (OutputStream resource = System.out) {
      resource.write("hello".getBytes());

      //- @exception defines/binding ExceptionVar
      //- ExceptionVar.node/kind variable
      //- ExceptionVar.subkind local/exception
    } catch (IOException exception) {}
  }
}

Protocol Buffers

The source language for Protocol Buffers is spelled "protobuf".

Common Lisp

The source language for Common Lisp is spelled "lisp".