PyRosettaCluster Tutorial 4 is an example of how to use a non-canonical residue or ligand .params
file with PyRosettaCluster
. If a structure contains a ligand that requires a .params
file, then PyRosetta must be initialized prior to job distribution with PyRosettaCluster
. For reproducibility outside of PyRosettaCluster
, PyRosetta should always be initialized with a constant seed.
Warning: This notebook uses pyrosetta.distributed.viewer
code, which runs in jupyter notebook
and might not run if you're using jupyterlab
.
Note: This Jupyter notebook uses parallelization and is not meant to be executed within a Google Colab environment.
Note: This Jupyter notebook requires the PyRosetta distributed layer which is obtained by building PyRosetta with the --serialization
flag or installing PyRosetta from the RosettaCommons conda channel
Please see Chapter 16.00 for setup instructions
Note: This Jupyter notebook is intended to be run within Jupyter Lab, but may still be run as a standalone Jupyter notebook.
import bz2
import glob
import logging
import os
import pyrosetta
import pyrosetta.distributed.io as io
import pyrosetta.distributed.viewer as viewer
from pyrosetta.distributed.cluster import PyRosettaCluster
logging.basicConfig(level=logging.INFO)
dask
:¶See Tutorial 1A for review:
Inject client code here, then run the cell:
if not os.getenv("DEBUG"):
from dask.distributed import Client
client = Client("tcp://127.0.0.1:40329")
else:
client = None
client
.params
file(s) and initialize PyRosetta with a constant seed:¶The -run:constant_seed 1
flag defines a default constant seed of 1111111
and is necessary for reproducibility of your simulation! Initialization is necessary prior to distributing jobs that return a Pose
or PackedPose
with ligand or non-canonical residues. If you do not propery initialize PyRosetta within the Jupyter Notebook, then your Jupyter Notebook kernel may die and the job distribution may fail.
if not os.getenv("DEBUG"):
params = os.path.join(os.getcwd(), "inputs", "TPA.am1-bcc.fa.params")
pyrosetta.distributed.init(f"-extra_res_fa {params} -run:constant_seed 1 -multithreading:total_threads 1")
def protocol1(packed_pose_in, **kwargs):
"""
Relax residue 1X (i.e. the ligand).
Args:
packed_pose_in: A `PackedPose` object. Optional.
**kwargs: PyRosettaCluster keyword arguments.
Returns:
A `PackedPose` object.
"""
import pyrosetta
import pyrosetta.distributed.io as io
import pyrosetta.distributed.tasks.rosetta_scripts as rosetta_scripts
xml = """
<ROSETTASCRIPTS>
<RESIDUE_SELECTORS>
<Index name="ligand_selector" resnums="1X"/>
<Not name="not_ligand_selector" selector="ligand_selector"/>
</RESIDUE_SELECTORS>
<TASKOPERATIONS>
<ResfileCommandOperation name="repack_ligand" command="NATAA" residue_selector="ligand_selector"/>
<OperateOnResidueSubset name="prevent_repacking" selector="not_ligand_selector">
<PreventRepackingRLT/>
</OperateOnResidueSubset>
</TASKOPERATIONS>
<MOVERS>
<FastRelax name="relax" task_operations="repack_ligand,prevent_repacking">
<MoveMap bb="0" chi="0" jump="1">
<ResidueSelector selector="ligand_selector" chi="1" bb="1" bondangle="0" bondlength="0"/>
</MoveMap>
</FastRelax>
</MOVERS>
<PROTOCOLS>
<Add mover="relax"/>
</PROTOCOLS>
</ROSETTASCRIPTS>
"""
return rosetta_scripts.SingleoutputRosettaScriptsTask(xml)(io.pose_from_file(kwargs["s"]))
distribute()
:¶if not os.getenv("DEBUG"):
my_task = {
"options": "-ex1",
"extra_options": f"-out:level 300 -multithreading:total_threads 1 -extra_res_fa {params}",
"s": os.path.join(os.getcwd(), "inputs", "test_lig.pdb"),
}
output_path = os.path.join(os.getcwd(), "outputs_4")
PyRosettaCluster(
tasks=my_task,
client=client,
scratch_dir=output_path,
output_path=output_path,
).distribute(protocols=[protocol1])
While jobs are running, you may monitor their progress using the dask dashboard diagnostics within Jupyter Lab!
Gather the input and output decoys from disk into memory:
if not os.getenv("DEBUG"):
input_file = os.path.join(os.getcwd(), "inputs", "test_lig.pdb")
output_files = glob.glob(os.path.join(output_path, "decoys", "*", "*.pdb.bz2"))
packed_poses = []
for pdbfile in [input_file] + output_files:
if pdbfile.endswith(".bz2"):
with open(pdbfile, "rb") as f:
packed_poses.append(io.pose_from_pdbstring(bz2.decompress(f.read()).decode()))
elif pdbfile.endswith(".pdb"):
with open(pdbfile, "r") as f:
packed_poses.append(io.pose_from_pdbstring(f.read()))
View the poses in memory:
if not os.getenv("DEBUG"):
chX = pyrosetta.rosetta.core.select.residue_selector.ChainSelector("X")
view = viewer.init(packed_poses, window_size=(800, 600))
view.add(viewer.setStyle())
view.add(viewer.setStyle(residue_selector=chX, colorscheme="magentaCarbon", radius=0.35))
view.add(viewer.setHydrogenBonds())
view.add(viewer.setHydrogens(polar_only=True))
view()
You have successfully executed a PyRosetta simulation that modifies a ligand residue with PyRosettaCluster
!