Run GraphScope like NetworkX

Graphscope provides a set of graph analysis interfaces compatible with Networkx.

In this article, we will show how to use graphscope to perform graph analysis like Networkx.

How does Networkx perform graph analysis ?

Usually, the graph analysis process of NetworkX starts with the construction of a graph.

In the following example, we create an empty graph first, and then expand the data through the interface of NetworkX.

In [ ]:
# Install graphscope package if you are NOT in the Playground

!pip3 install graphscope
In [ ]:
import networkx
In [ ]:
# Initialize an empty graph
G = networkx.Graph()

# Add edges (1, 2)and(1 3) by `add_edges_from` interface
G.add_edges_from([(1, 2), (1, 3)])

# Add vertex "4" by `add_node` interface 
G.add_node(4)

Then we can query the graph information.

In [ ]:
# Query the number of vertices by `number_of_nodes` interface.
G.number_of_nodes()
In [ ]:
# Similarly, query the number of edges by `number_of_edges` interface.
G.number_of_edges()
In [ ]:
# Query the degree of each vertex by `degree` interface.
sorted(d for n, d in G.degree())

Finally, calling the builtin algorithm of NetworkX to analysis the graph G.

In [ ]:
# Run 'connected components' algorithm
list(networkx.connected_components(G))
In [ ]:
# Run 'clustering' algorithm
networkx.clustering(G)

How to use NetworkX interface from GraphScope

Graph Building

To use NetworkX interface from graphscope, we just need to replace import networkx as nx with import graphscope.nx as nx.

Here we use nx.Graph() interace to create an empty undirected graph.

In [ ]:
import graphscope
graphscope.set_option(show_log=True)
import graphscope.nx as nx

# Initialize an empty graph
G = nx.Graph()

Add edges and vertices

Just like operating NetworkX, you can add vertices by add_node add_nodes_from and add edges by add_edge add_edges_from.

In [ ]:
# Add one vertex by `add_node` interface
G.add_node(1)

# Or add a batch of vertices from iterable list
G.add_nodes_from([2, 3])

# Also you can add attributes while adding vertices
G.add_nodes_from([(4, {"color": "red"}), (5, {"color": "green"})])

# Similarly, add one edge by `add_edge` interface
G.add_edge(1, 2)
e = (2, 3)
G.add_edge(*e)

# Or add a batch of edges from iterable list
G.add_edges_from([(1, 2), (1, 3)])

# Add attributes while adding edges
G.add_edges_from([(1, 2), (2, 3, {'weight': 3.1415})])

Query Graph

Just like operating NetworkX, you can search the number of vertices/edge by number_of_nodes/number_of_edges interface, or query the neighbor of vertex by adj interface.

In [ ]:
# Query the number of vertices by `number_of_nodes` interface.
G.number_of_nodes()
In [ ]:
# Similarly, query the number of edges by `number_of_edges` interface.
G.number_of_edges()
In [ ]:
# list the vertices in graph `G`
list(G.nodes)
In [ ]:
# list the edges in graph `G`
list(G.edges)
In [ ]:
# query the nerghbors of vertex '1'
list(G.adj[1])
In [ ]:
# search the degree of vertex '1'
G.degree(1)

Delete

Just like operating NetworkX, you can remove vertices by remove_nodeor remove_nodes_from interface, and remove edges by remove_edge or remove_edges_from interface.

In [ ]:
# remove one vertex by `remove_node` interface
G.remove_node(5)
list(G.nodes)
In [ ]:
# remove a batch of vertices by `remove_nodes_from` interface
G.remove_nodes_from([4, 5])
list(G.nodes)
In [ ]:
# remove one edge by `remove_edge` interface
G.remove_edge(1, 2)
list(G.edges)
In [ ]:
# remove a batch of edges by `remove_edges_from` interface
G.remove_edges_from([(1, 3), (2, 3)])
list(G.edges)
In [ ]:
# query the number of vertices after removal
G.number_of_nodes()
In [ ]:
# query the number of edges after removal
G.number_of_edges()

Graph Analysis

The interface of graph analysis module in graphscope is also compatible with NetworkX.

In following examples, we use connected_components to analyze the connected components of the graph, use clustering to get the clustering coefficient of each vertex, and all_pairs_shortest_path to compute the shortest path between any two vertices.

In [ ]:
# Building graph
G = nx.Graph()
G.add_edges_from([(1, 2), (1, 3)])
G.add_node(4)
In [ ]:
# Run connected_components
list(nx.connected_components(G))
In [ ]:
# Run clustering
nx.clustering(G)
In [ ]:
# Run all_pairs_shortest_path
sp = dict(nx.all_pairs_shortest_path(G))
sp[3]

Graph Display

Like NetworkX, you can draw a graph by draw interface, which relies on the drawing function of 'Matplotlib'.

You should install matplotlib first if you are not in playground environment.

In [ ]:
!pip3 install matplotlib

使用 GraphScope 来进行简单地绘制图

In [ ]:
# Create a star graph with 5 vertices
G = nx.star_graph(5)

# Sraw
nx.draw(G, with_labels=True, font_weight='bold')

The performance speed-up of GraphScope over NetworkX can reach up to several orders of magnitudes.

Let's see how much GraphScope improves the algorithm performance compared with NetworkX by a simple experiment.

We run clustering algorithm on twitter datasets.

Download dataset if you are not in playground environment

In [ ]:
!wget https://raw.githubusercontent.com/GraphScope/gstest/master/twitter.e -P /tmp

Loading dataset both in GraphScope and NetwrokX.

In [ ]:
import os
import graphscope.nx as gs_nx
import networkx as nx
In [ ]:
# loading graph in NetworkX
g1 = nx.read_edgelist(
     os.path.expandvars('/tmp/twitter.e'), nodetype=int, data=False, create_using=nx.Graph
)
type(g1)
In [ ]:
# Loading graph in GraphScope
g2 = gs_nx.read_edgelist(
     os.path.expandvars('/tmp/twitter.e'), nodetype=int, data=False, create_using=gs_nx.Graph
)
type(g2)

Run algorithm and display time both in GraphScope and NetworkX.

In [ ]:
%%time
# GraphScope
ret_gs = gs_nx.clustering(g2)
In [ ]:
%%time
# NetworkX
ret_nx = nx.clustering(g1)
In [ ]:
# Result comparison
ret_gs == ret_nx
In [ ]: