Unsupervised Graph Learning with BipartiteGraphSage

Bipartite graphs are very common in e-commerce recommendation. In this tutorial, we demostrate how GraphScope trains a model with BipartiteGraphSage on bipartite graph.

The task is link prediction, which estimates the probability of links between user and item nodes in a graph.

In this task, we use our implementation of BipartiteGraphSage algorithm to build a model that predicts user-item links in the U2I dataset. In which nodes can represents user node and item node. The task can be treated as a unsupervised link prediction on a homogeneous link network.

In this task, BipartiteGraphSage algorithm would compress both structural and attribute information in the graph into low-dimensional embedding vectors on each node. These embeddings can be further used to predict links between nodes.

This tutorial has following steps:

  • Launching the learning engine and attaching to loaded graph.
  • Defining train process with builtin GraphSage model and hyperparameters
  • Training and evaluating
In [ ]:
# Install graphscope package if you are NOT in the Playground

!pip3 install graphscope
!pip3 uninstall -y importlib_metadata  # Address an module conflict issue on colab.google. Remove this line if you are not on colab.
In [ ]:
# Import the graphscope module.

import graphscope

graphscope.set_option(show_log=False)  # enable logging
In [ ]:
# Load u2i dataset

from graphscope.dataset import load_u2i

graph = load_u2i()

Launch learning engine

Then, we need to define a feature list for training. The training feature list should be seleted from the vertex properties. In this case, we choose the "feature" property as the training features.

With the featrue list, next we launch a learning engine with the graphlearn method of graphscope.

In this case, we specify the BipartiteGraphSage training over "u" and "i" nodes and "u-i" edges.

In [ ]:
# launch a learning engine.

lg = graphscope.graphlearn(
    graph,
    nodes=[("u", ["feature"]), ("i", ["feature"])],
    edges=[(("u", "u-i", "i"), ["weight"]), (("i", "u-i_reverse", "u"), ["weight"])],
)

We use the builtin BipartiteGraphSage model to define the training process. You can find more detail about all the builtin learning models on Graph Learning Model

In the example, we use tensorflow as NN backend trainer.

In [ ]:
import numpy as np
import tensorflow as tf
import graphscope.learning
from graphscope.learning.examples import BipartiteGraphSage
from graphscope.learning.graphlearn.python.model.tf.optimizer import get_tf_optimizer
from graphscope.learning.graphlearn.python.model.tf.trainer import LocalTFTrainer


# Unsupervised GraphSage.
def train(config, graph):
    def model_fn():
        return BipartiteGraphSage(
            graph,
            config["batch_size"],
            config["hidden_dim"],
            config["output_dim"],
            config["hops_num"],
            config["u_neighs_num"],
            config["i_neighs_num"],
            u_features_num=config["u_features_num"],
            u_categorical_attrs_desc=config["u_categorical_attrs_desc"],
            i_features_num=config["i_features_num"],
            i_categorical_attrs_desc=config["i_categorical_attrs_desc"],
            neg_num=config["neg_num"],
            use_input_bn=config["use_input_bn"],
            act=config["act"],
            agg_type=config["agg_type"],
            need_dense=config["need_dense"],
            in_drop_rate=config["drop_out"],
            ps_hosts=config["ps_hosts"],
        )

    graphscope.learning.reset_default_tf_graph()
    trainer = LocalTFTrainer(
        model_fn,
        epoch=config["epoch"],
        optimizer=get_tf_optimizer(
            config["learning_algo"], config["learning_rate"], config["weight_decay"]
        ),
    )

    trainer.train()
    u_embs = trainer.get_node_embedding("u")
    np.save("u_emb", u_embs)
    i_embs = trainer.get_node_embedding("i")
    np.save("i_emb", i_embs)


# Define hyperparameters
config = {
    "batch_size": 128,
    "hidden_dim": 128,
    "output_dim": 128,
    "u_features_num": 1,
    "u_categorical_attrs_desc": {"0": ["u_id", 10000, 64]},
    "i_features_num": 1,
    "i_categorical_attrs_desc": {"0": ["i_id", 10000, 64]},
    "hops_num": 1,
    "u_neighs_num": [10],
    "i_neighs_num": [10],
    "neg_num": 10,
    "learning_algo": "adam",
    "learning_rate": 0.001,
    "weight_decay": 0.0005,
    "epoch": 5,
    "use_input_bn": True,
    "act": tf.nn.leaky_relu,
    "agg_type": "gcn",
    "need_dense": True,
    "drop_out": 0.0,
    "ps_hosts": None,
}

Run training process

After define training process and hyperparameters,

Now we can start the traning process with learning engine "lg" and the hyperparameters configurations.

In [ ]:
train(config, lg)
In [ ]: