GraphScope provides the capability to process learning tasks. In this tutorial, we demonstrate how GraphScope trains a model with GraphSage.
The task is link prediction, which estimates the probability of links between nodes in a graph.
In this task, we use our implementation of GraphSAGE algorithm to build a model that predicts protein-protein links in the PPI dataset. In which every node represents a protein. The task can be treated as a unsupervised link prediction on a homogeneous link network.
In this task, GraphSage algorithm would compress both structural and attribute information in the graph into low-dimensional embedding vectors on each node. These embeddings can be further used to predict links between nodes.
This tutorial has following steps:
# Install graphscope package if you are NOT in the Playground
!pip3 install graphscope
# Import the graphscope module.
import graphscope
graphscope.set_option(show_log=False) # enable logging
# Load ppi dataset
from graphscope.dataset import load_ppi
graph = load_ppi()
Then, we need to define a feature list for training. The training feature list should be selected from the vertex properties. In this case, we choose all the properties prefix with "feat-" as the training features.
With the feature list, next we launch a learning engine with the graphlearn method of graphscope.
In this case, we specify the GraphSAGE training over "protein" nodes and "link" edges.
With gen_labels, we take all the protein nodes as training set.
# define the features for learning
protein_features = []
for i in range(50):
protein_features.append("feat-" + str(i))
# launch a learning engine.
lg = graphscope.graphlearn(
graph,
nodes=[("protein", protein_features)],
edges=[("protein", "link", "protein")],
gen_labels=[
("train", "protein", 100, (0, 100)),
],
)
We use the builtin GraphSAGE model to define the training process.
In the example, we use tensorflow as "NN" backend trainer.
try:
# https://www.tensorflow.org/guide/migrate
import tensorflow.compat.v1 as tf
tf.disable_v2_behavior()
except ImportError:
import tensorflow as tf
import argparse
import graphscope.learning.graphlearn.python.nn.tf as tfg
from graphscope.learning.examples import EgoGraphSAGE
from graphscope.learning.examples import EgoSAGEUnsupervisedDataLoader
from graphscope.learning.examples.tf.trainer import LocalTrainer
def parse_args():
argparser = argparse.ArgumentParser("Train EgoSAGE Unsupervised.")
argparser.add_argument('--batch_size', type=int, default=512)
argparser.add_argument('--features_num', type=int, default=50)
argparser.add_argument('--hidden_dim', type=int, default=128)
argparser.add_argument('--output_dim', type=int, default=128)
argparser.add_argument('--nbrs_num', type=list, default=[5, 5])
argparser.add_argument('--learning_rate', type=float, default=0.01)
argparser.add_argument('--epochs', type=int, default=2)
argparser.add_argument('--drop_out', type=float, default=0.0)
argparser.add_argument('--temperature', type=float, default=0.07)
argparser.add_argument('--node_type', type=str, default="protein")
argparser.add_argument('--edge_type', type=str, default="link")
return argparser.parse_args()
args = parse_args()
# Define Model
dims = [args.features_num] + [args.hidden_dim] * (len(args.nbrs_num) - 1) + [args.output_dim]
model = EgoGraphSAGE(dims, dropout=args.drop_out)
# Prepare train dataset
train_data = EgoSAGEUnsupervisedDataLoader(lg, None, batch_size=args.batch_size,
node_type=args.node_type, edge_type=args.edge_type, nbrs_num=args.nbrs_num)
src_emb = model.forward(train_data.src_ego)
dst_emb = model.forward(train_data.dst_ego)
neg_dst_emb = model.forward(train_data.neg_dst_ego)
loss = tfg.unsupervised_softmax_cross_entropy_loss(
src_emb, dst_emb, neg_dst_emb, temperature=args.temperature)
optimizer = tf.train.AdamOptimizer(learning_rate=args.learning_rate)
After define training process and hyperparameters, we can start training.
trainer = LocalTrainer()
trainer.train(train_data.iterator, loss, optimizer, epochs=args.epochs)