PyRosettaCluster Tutorial 2 is an example of using multiple user-provided PyRosetta protocols with PyRosettaCluster
. Unlike Rosetta's MultiplePoseMover
which executes multiple protocols serially, PyRosettaCluster
executes multiple protocols in parallel (provided the cluster has more than one distributed worker). The user defines the order in which the protocols execute. Each Pose
or PackedPose
object returned from the first user-provided PyRosetta protocol is automatically passed to the second user-providd PyRosetta protocol, and so on. That is, protocol1
returns a Pose
object, which is then used as input for protocol2
; protocol2
returns a new Pose
object, which is then used as input for protocol3
, and so on. Pose
objects returned by the final protocol are written to disk (unless the user specifies PyRosettaCluster(..., save_all=True, ...)
in which case all intermediate decoys are also written to disk. Each decoy contains all of the relevant information needed to reproduce it.
Warning: This notebook uses pyrosetta.distributed.viewer
code, which runs in jupyter notebook
and might not run if you're using jupyterlab
.
Note: This Jupyter notebook uses parallelization and is not meant to be executed within a Google Colab environment.
Note: This Jupyter notebook requires the PyRosetta distributed layer which is obtained by building PyRosetta with the --serialization
flag or installing PyRosetta from the RosettaCommons conda channel
Please see Chapter 16.00 for setup instructions
Note: This Jupyter notebook is intended to be run within Jupyter Lab, but may still be run as a standalone Jupyter notebook.
import bz2
import glob
import json
import logging
import os
import pyrosetta
import pyrosetta.distributed.io as io
import pyrosetta.distributed.viewer as viewer
from pyrosetta.distributed.cluster import PyRosettaCluster
logging.basicConfig(level=logging.INFO)
dask
:¶See Tutorial 1A to review:
Inject client code here, then run the cell:
if not os.getenv("DEBUG"):
from dask.distributed import Client
client = Client("tcp://127.0.0.1:40329")
else:
client = None
client
Client
|
Cluster
|
User-provided PyRosetta protocols may return Pose
or PackedPose
objects to be passed on to the next protocol. Protocols that don't return Pose
or PackedPose
objects are allowed, for example returning a NoneType
object. In such cases, the subsequent protocol receives an empty PackedPose
object.
def protocol1(packed_pose_in, **kwargs):
"""
Repacks the input `PackedPose` object, which can be (a) input to the function
automatically via the 'packed_pose_in' argument or (b) accessed through the 's'
`kwargs` keyword argument, depending on the order in which the protocol is
specified in the PyRosettaCluster.distributed() method.
Args:
packed_pose_in: A `PackedPose` object to be repacked. Optional.
**kwargs: PyRosettaCluster keyword arguments.
Returns:
A `PackedPose` object.
"""
import pyrosetta
import pyrosetta.distributed.io as io
import pyrosetta.distributed.tasks.rosetta_scripts as rosetta_scripts
logging.info(
"Now executing protocol number '{0}' called '{1}'.".format(
kwargs["PyRosettaCluster_protocol_number"],
kwargs["PyRosettaCluster_protocol_name"]
)
)
if packed_pose_in == None:
logging.info("Generating `packed_pose_in` from `kwargs['s']`.")
packed_pose_in = io.pose_from_file(kwargs["s"])
else:
logging.info("Using `packed_pose_in` from `args`.")
xml = """
<ROSETTASCRIPTS>
<TASKOPERATIONS>
<RestrictToRepacking name="restrict_to_repacking"/>
</TASKOPERATIONS>
<MOVERS>
<PackRotamersMover name="pack" task_operations="restrict_to_repacking" />
</MOVERS>
<PROTOCOLS>
<Add mover="pack"/>
</PROTOCOLS>
</ROSETTASCRIPTS>
"""
return rosetta_scripts.SingleoutputRosettaScriptsTask(xml)(packed_pose_in.pose.clone())
def protocol2(packed_pose_in, **kwargs):
"""
Performs sequence design (Thr24Ser) on an input pose.
Args:
packed_pose_in: A `PackedPose` object to be designed.
**kwargs: PyRosettaCluster keyword arguments.
Returns:
A `PackedPose` object.
"""
import pyrosetta
import pyrosetta.distributed.tasks.rosetta_scripts as rosetta_scripts
xml = """
<ROSETTASCRIPTS>
<RESIDUE_SELECTORS>
<Index name="T24" resnums="24A"/>
<Not name="not24" selector="T24"/>
</RESIDUE_SELECTORS>
<TASKOPERATIONS>
<ResfileCommandOperation name="design" command="PIKAA S" residue_selector="T24"/>
<OperateOnResidueSubset name="prevent_repacking" selector="not24">
<PreventRepackingRLT/>
</OperateOnResidueSubset>
</TASKOPERATIONS>
<MOVERS>
<PackRotamersMover name="pack" task_operations="design,prevent_repacking"/>
</MOVERS>
<PROTOCOLS>
<Add mover="pack"/>
</PROTOCOLS>
</ROSETTASCRIPTS>
"""
return rosetta_scripts.SingleoutputRosettaScriptsTask(xml)(packed_pose_in.pose.clone())
def create_tasks():
yield {
"options": "-ex1",
"extra_options": "-out:level 300 -multithreading:total_threads 1",
"set_logging_handler": "interactive",
"s": os.path.join(os.getcwd(), "inputs", "1QYS.pdb"),
}
distribute()
:¶if not os.getenv("DEBUG"):
output_path = os.path.join(os.getcwd(), "outputs_2")
PyRosettaCluster(
tasks=create_tasks,
client=client,
scratch_dir=output_path,
output_path=output_path,
).distribute(protocols=[protocol1, protocol2, protocol1])
INFO:pyrosetta.distributed:maybe_init performing pyrosetta initialization: {'options': '-run:constant_seed 1 -multithreading:total_threads 1', 'extra_options': '-mute all', 'silent': True} INFO:pyrosetta.rosetta:Found rosetta database at: /shared/home/jklima/.conda/envs/jupyterlab/lib/python3.7/site-packages/pyrosetta/database; using it.... INFO:pyrosetta.rosetta:PyRosetta-4 2020 [Rosetta PyRosetta4.conda.linux.cxx11thread.serialization.CentOS.python37.Release 2020.15+release.3121c734db02d2b62dd1974dcb8daface3f50057 2020-04-10T09:29:24] retrieved from: http://www.pyrosetta.org (C) Copyright Rosetta Commons Member Institutions. Created in JHU by Sergey Lyskov and PyRosetta Team.
While jobs are running, you may monitor their progress using the dask dashboard diagnostics within Jupyter Lab!
Gather the input and output decoys from disk into memory:
if not os.getenv("DEBUG"):
input_file = os.path.join(os.getcwd(), "inputs", "1QYS.pdb")
output_file = glob.glob(os.path.join(output_path, "decoys", "*", "*.pdb.bz2"))[0]
packed_poses = []
for pdbfile in [input_file, output_file]:
if pdbfile.endswith(".bz2"):
with open(pdbfile, "rb") as f:
packed_poses.append(io.pose_from_pdbstring(bz2.decompress(f.read()).decode()))
elif pdbfile.endswith(".pdb"):
with open(pdbfile, "r") as f:
packed_poses.append(io.pose_from_pdbstring(f.read()))
The original Top7 (PDB ID: 1QYS) decoy and the designed Top7 decoy with the T24S mutation highlighted is shown below using the pyrosetta.distributed.viewer
visualizer:
if not os.getenv("DEBUG"):
resi_24 = pyrosetta.rosetta.core.select.residue_selector.ResidueIndexSelector("24A")
view = viewer.init(packed_poses, window_size=(800, 600))
view.add(viewer.setStyle())
view.add(viewer.setStyle(colorscheme="whiteCarbon", radius=0.25))
view.add(viewer.setStyle(residue_selector=resi_24, colorscheme="magentaCarbon", radius=0.5))
view.add(viewer.setHydrogenBonds())
view.add(viewer.setHydrogens(polar_only=True))
view.add(viewer.setDisulfides(radius=0.25))
view()
interactive(children=(IntSlider(value=0, continuous_update=False, description='Decoys', max=1), Output()), _do…
<function pyrosetta.distributed.viewer.core.Viewer.show.<locals>.view(i=0)>
You have successfully run PyRosettaCluster
with multiple user-provided PyRosetta protocols!