To use GraphScope, we need to establish a session first.
A session encapsulates the control and the state of the GraphScope engines. It serves as an entrance in the python client to GraphScope. A session allows users to deploy and connect GraphScope on a k8s cluster.
In this tutorial, we will demostrate:
First of all, you should import graphscope.
import graphscope
For better understanding of the launching process, we recommend to enable the show_log option in the package scope.
graphscope.set_option(show_log=True)
A default session can be easily launched, even without any parameters.
s1 = graphscope.session()
Behind the scenes, the session tries to launch a coordinator, which is the entry for the back-end engines. The coordinator manages a cluster of k8s pods (2 pods by default), and the interactive/analytical/learning engines ran on them. For each pod in the cluster, there is a vineyard instance at service for distributed data in memory.
Run the cell and take a look at the log, it prints the whole process of the session launching.
The log GraphScope coordinator service connected means the session launches successfully, and the current Python client has connected to the session.
You can also check a session's status by this.
s1
Run this cell, you may find a status
field with value active
. Together with the status, it also prints other metainfo of this session, i.e., such as the number of workers(pods), the coordinator endpoint for connection, and so on.
A session manages the resources in the cluster, thus it is important to release these resources when they are no longer required. To de-allocate the resources, use the method close on the session when all the graph tasks are finished.
s1.close()
GraphScope session provides several keyword arguments to config the cluster.
For example, you may use k8s_gs_image
to specify the image of the GraphScope,
or use num_workers
to specify the number of pods. You may use help(graphscope.session)
to check
all available arguments.
s2 = graphscope.session(num_workers=1, k8s_engine_cpu=1, k8s_engine_mem='4Gi', timeout_seconds=1200)
s2.close()
Parametes are allowed to pass as a json string or Dict
.
config = {'num_workers': 1, 'timeout_seconds': 100}
s3 = graphscope.session(config=config)
s3.close()
To save or load data, you may want to mount a file volume to the allocated cluster.
For example, we prepared some sample graph datasets for in the host location (/testingdata
). You can mount it to path /home/jovyan/datasets
. Then in the pods, you are able to access these testing data.
Note that, path /testingdata
in server is a Copy of /home/jovyan/datasets
in your HOME dir, and any modification locally will not affect the directory mounted on the server.
k8s_volumes = {
"data": {
"type": "hostPath",
"field": {
"path": '/testingdata',
"type": "Directory"
},
"mounts": {
"mountPath": "/home/jovyan/datasets",
"readOnly": True
}
}
}
s4 = graphscope.session(k8s_volumes=k8s_volumes)
s4.close()