PyRosettaCluster Tutorial 1B uses the pyrosetta.distributed.cluster
python module to reproduce a decoy generated by a PyRosetta simulation previosly run in PyRosettaCluster Tutorial 1A, using only an input .pdb
file and the original user-provided PyRosetta protocol(s).
In PyRosettaCluster Tutorial 1A, you used PyRosettaCluster
to apply a PyRosetta protocol to an input .pdb
file, and generated several output .pdb
files. Each output .pdb
file contains information needed to exactly reproduce it.
Warning: This notebook uses pyrosetta.distributed.viewer
code, which runs in jupyter notebook
and might not run if you're using jupyterlab
.
Note: This Jupyter notebook uses parallelization and is not meant to be executed within a Google Colab environment.
Note: This Jupyter notebook requires the PyRosetta distributed layer which is obtained by building PyRosetta with the --serialization
flag or installing PyRosetta from the RosettaCommons conda channel
Please see Chapter 16.00 for setup instructions
Note: This Jupyter notebook is intended to be run within Jupyter Lab, but may still be run as a standalone Jupyter notebook.
import bz2
import json
import glob
import logging
import os
import pandas as pd
import pyrosetta
import pyrosetta.distributed.io as io
import pyrosetta.distributed.viewer as viewer
from pyrosetta.distributed.cluster import PyRosettaCluster, reproduce
logging.basicConfig(level=logging.INFO)
dask
¶See Tutorial 1A to review:
Click the "Dask" tab in Jupyter Lab (arrow, left)
Click the "+ NEW" button to launch a new compute cluster (arrow, lower)
Once the cluster has started, click the brackets to "inject client code" for the cluster into your notebook
Inject client code here, then run the cell:
if not os.getenv("DEBUG"):
from dask.distributed import Client
client = Client("tcp://127.0.0.1:40329")
else:
client = None
client
The purpose of the sha1
attribute of PyRosettaCluster
is to ensures that you have committed all of your untracked changes into your git repository before executing the original simulation. When you run the reproduce
function, the original sha1
attribute of PyRosettaCluster
was captured in the output decoy .pdb
file which ensures that you have checked out the same git SHA1 hash before reproducing the simulation. In this way, my_protocol
remains statically captured at the git SHA1 hash from the original simulation. However, you may always update my_protocol
, commit your changes to your git repository, and re-run the simulation, because the sha1
attribute of PyRosettaCluster
automatically detects the new git SHA1 hash in your git repository.
if not os.getenv("DEBUG"):
from additional_scripts.my_protocols import my_protocol
client.upload_file("additional_scripts/my_protocols.py") # This sends a local file up to all worker nodes.
The simulation in Tutorial 1A generated four decoys (because nstruct=4
in the original simulation). Let's say we'd like to reproduce the decoy with the lowest energy. First, let's inspect the results with the pandas
library:
if not os.getenv("DEBUG"):
original_results = glob.glob(os.path.join(os.getcwd(), "outputs_1A", "decoys", "*", "*.pdb.bz2"))
data = {}
for original_result in original_results:
with open(original_result, "rb") as f:
pdbstring = bz2.decompress(f.read()).decode()
for line in reversed(pdbstring.split("\n")):
remark = "REMARK PyRosettaCluster: "
if line.startswith(remark):
data[original_result] = json.loads(line.split(remark)[-1])["scores"]
break
df = pd.DataFrame().from_records(data).T
df
Now locate the decoy with the lowest Rosetta total_score
to reproduce:
if not os.getenv("DEBUG"):
decoy_to_reproduce = df.sort_values(by="total_score", ascending=True).index[0]
decoy_to_reproduce
reproduce()
:¶Reproducing the decoy is accomplished with the reproduce()
function of the pyrosettacluster
module. This method requires the .pdb
or .pdb.bz2
file to reproduce: input_file
. Alternatively, a scorefile
with full simulation records and a decoy_name
may be provided to reproduce()
instead of the .pdb
or .pdb.bz2
file. The user-provided PyRosetta protocol(s) must be defined or imported and input into reproduce()
as the protocols
argument parameter. The user is responsible for supplying the same protocol that was used in the original simulation! Additionally, any supplied instance_kwargs
will override any PyRosettaCluster
instance attributes from the input_file
or scorefile
. This may be useful when, for example, you want to change your cluster configuration while reproducing a decoy.
if not os.getenv("DEBUG"):
output_path = os.path.join(os.getcwd(), "outputs_1B")
reproduce(
input_file=decoy_to_reproduce,
input_packed_pose=None, # Optional, if you used the `input_packed_pose` attribute of `PyRosettaCluster` in the original simulation
client=client, # Optional
instance_kwargs={"output_path": output_path, "nstruct": 1}, # Specify new output path, and set `nstruct` to 1 to reproduce the decoy only once.
protocols=[my_protocol],
)
if not os.getenv("DEBUG"):
reproduced_results = glob.glob(os.path.join(output_path, "decoys", "*", "*.pdb.bz2"))
assert len(reproduced_results) == 1
with open(reproduced_results[0], "rb") as f:
reproduced_packed_pose = io.pose_from_pdbstring(bz2.decompress(f.read()).decode())
if not os.getenv("DEBUG"):
view = viewer.init(reproduced_packed_pose, window_size=(800, 600))
view.add(viewer.setStyle())
view.add(viewer.setStyle(colorscheme="whiteCarbon", radius=0.25))
view.add(viewer.setHydrogenBonds())
view.add(viewer.setHydrogens(polar_only=True))
view.add(viewer.setDisulfides(radius=0.25))
view()
PyRosetta trajectories are deterministic depending on the input random number generated seed(s)!
if not os.getenv("DEBUG"):
with open(decoy_to_reproduce, "rb") as f:
original_packed_pose = io.pose_from_pdbstring(bz2.decompress(f.read()).decode())
original_pose = original_packed_pose.pose
reproduced_pose = reproduced_packed_pose.pose
if not os.getenv("DEBUG"):
assert original_pose.sequence() == reproduced_pose.sequence()
total_score
s are identical:¶if not os.getenv("DEBUG"):
scorefxn = pyrosetta.create_score_function("ref2015.wts")
assert scorefxn(original_pose) == scorefxn(reproduced_pose)
0.0
Å:¶Note: There is no need to first superimpose the original_pose
and reproduced_pose
because they were both generated starting from the same input_packed_pose
if not os.getenv("DEBUG"):
assert pyrosetta.rosetta.core.scoring.CA_rmsd(original_pose, reproduced_pose) == 0.0
You have successfully reproduced a PyRosetta simulation using the pyrosetta.distributed.cluster
module!