Main paper can be found here. SaGe = Sample and Aggregation functions

Learns parametric functions in order to generate low dimensional embeddings of the nodes in the graph
These functions can also be used to generate embeddings of unseen nodes too
It can also optionally use node features as starting points for embeddings
Configurable number of layers representing the number of hops for each node
Custom loss function (based on negative-sampling) for unsupervised learning
Also supports supervised learning
Scales to large graphs through neighborhood sampling and minibatch techniques

$$h_v^0 = x_v$$
for k = 1 to K
- for v in minibatch
  - $$h_{N(v)}^k = aggregator_k(h_u^{k-1}, u \epsilon N(v))$$
  - $$h_v^k = \sigma(W^k concat(h_v^{k-1}, h_{N(v)}^k))$$
  - $$h_v^k = \frac{h_v^k}{L2norm(h_v^k)}$$
$$z_v = h_v^K$$

Where:

In paper the authors show the similarity of this algo with the popular Weisfeiler-Lehman graph isomorphism test.

performs uniform sampling
different samples are drawn for each layer based on different sampling rates $$S_i$$, for each layer
authors recommend to keep $$\Pi_i S_i \le 500$$
sampling technique is pretty much the same as seen in [PinSage]({{ site.baseurl }}{% post_url 2020-06-10-pinsage %})
They recommend $$K = 2, S_1 = 25, S_2 = 10$$

via SGD + using Adam optimizer
loss function for unsupervised mode is based on negative sampling and its equation is as follows: $$J(z_u) = -log(sigmoid(z_u^T z_v)) - Q E[log(sigmoid(z_u^T z_{v_n}))]$$
$$Q$$ = number of negative samples
$$v_n$$ = negative sample
$$v$$ = node that is a neighbor of $$u$$

These must be permutation invariant. They test the following aggregators