Kythe - The <code>influences</code> relation

Kythe stores information about dataflow in the graph using influences edges. A node A influences a node B if A directly affects B during the evaluation of a program. This relationship is not intended to model all dataflow interactions. In particular, it excludes nontrivial aliasing (that can’t be decided within the scope of a single compilation unit). Despite this, the influence relation can be used to trace simple paths through a program by exploring the subgraph iteratively, allowing for non-universal queries that fit the following examples:

find places where a flag passed to a program directly affects a different program value
find places where the return value from a sensitive function is passed to an insensitive sink, like leaking a key to a logging function
identify the places that affect a program value that was assigned something unexpected at runtime

This document is intended to describe how to add the influences relationship to a Kythe indexer based on common language features. Note that this document is still under active development.

General framework

We assume that the indexer being amended performs an AST walk over a tree with resolved references (so it’s possible to construct VNames for, e.g., a variable that is referenced). The indexer should maintain a stack of sets of influencers (using whichever type is appopriate; for C++, we use pointers to declarations). As the indexer enters and leaves various kinds of AST node and encounters others, it will manipulate this stack.

Names follow this convention:

v is a simple program variable.
p is a parameter.
e is an expression.

Popping the set "onto" an object o with node n means removing the top set from the influencers stack and recording i influences n edges for all nodes i for the objects in that top set.

The blame context is the node or nodes that would be assigned responsibility for making a function call at the current point in the AST. For example:

int f() { g(); }

Here, the call to g would be assigned to blame context f.

Variables

Simple reference

Insert v.

Simple assignment

v := e

Push {}.
Traverse e.
Pop onto v.

A simple expression. (C++)

void f() {
  //- @x defines/binding VarX
  int x = 0;
  //- @y defines/binding VarY
  int y = 1;
  //- // VarX influences VarY
  y = x;
}

Functions

Simple calls

e(e1 ... eN)

Traverse e.
For each of e1 ... eN with corresponding parameters p1 ... pN:
Push {}.
Traverse eI.
Pop onto pI.
Insert e.

Returns

return e;

Push {}.
Traverse e.
Pop onto the blame context.

A function call. (C++)

//- @x defines/binding ParamX
//- @g defines/binding FnG
int g(int x) {
  //- ParamX influences FnG
  return x + 1;
}

void f() {
  //- @y defines/binding VarY
  int y = 0;
  //- @z defines/binding VarZ
  int z;
  //- VarY influences ParamX
  //- FnG influences VarZ
  z = y + g(y);
}

Forward declarations

f'(p'1 ... p'N);
f(p1 ... pN) { ... }

p'I influences pI.
f influences f'.

Forward declarations. (C++)

//- @f defines/binding FwdFnF
//- @x defines/binding FwdParamX
//- @y defines/binding FwdParamY
int f(int x, int y);

//- @f defines/binding DefnFnF
//- @x defines/binding DefnParamX
//- @y defines/binding DefnParamY
int f(int x, int y) {
  return x + y;
}

//- FwdParamX influences DefnParamX
//- FwdParamY influences DefnParamY
//- DefnFnF influences FwdFnF