This notebook contains material from PyRosetta; content is available on Github.

Distributed analysis example: exhaustive ddG PSSM


This tutorial will walk you through how to generate an exhaustive ddG PSSM in PyRosetta using the PyData stack for analysis and distributed computing.

This Jupyter notebook uses parallelization and is not meant to be executed within a Google Colab environment.


Please see setup instructions in Chapter 16.00


Integration of the Rosetta Suite with the Python Software Stack via reproducible packaging and core programming interfaces for distributed simulation

Alexander S. Ford, Brian D. Weitzner, Christopher D. Bahl


Documentation for the pyrosetta.distributed namespace can be found here:

In [ ]:
import sys
if 'google.colab' in sys.modules:
    print("This Jupyter notebook uses parallelization and is therefore not set up for the Google Colab environment.")
In [1]:
import logging
In [2]:
import pandas
import seaborn
import matplotlib
In [3]:
import Bio.SeqUtils
import Bio.Data.IUPACData as IUPACData
In [4]:
import pyrosetta
import as io
import pyrosetta.distributed.packed_pose as packed_pose
import pyrosetta.distributed.tasks.rosetta_scripts as rosetta_scripts
import pyrosetta.distributed.tasks.score as score
In [5]:
import os,sys,platform


Create test pose, initialize rosetta and pack

In [6]:
input_protocol = """
    <RestrictToRepacking name="only_pack"/>

    <PackRotamersMover name="pack" task_operations="only_pack" />
    <Add mover="pack"/>
In [7]:
input_relax = rosetta_scripts.SingleoutputRosettaScriptsTask(input_protocol)
# Syntax check via setup
INFO:pyrosetta.distributed:maybe_init performing pyrosetta initialization: {'extra_options': '-out:levels all:warning', 'silent': True}
INFO:pyrosetta.rosetta:Found rosetta database at: /Users/jadolfbr/anaconda3/envs/PyRosetta.notebooks/lib/python3.7/site-packages/pyrosetta/database; using it....
INFO:pyrosetta.rosetta:PyRosetta-4 2019 [Rosetta PyRosetta4.conda.mac.python37.Release 2019.50+release.91b7a940f06ab065a81d9ce3046b08eef0de0b31 2019-12-12T23:03:24] retrieved from:
(C) Copyright Rosetta Commons Member Institutions. Created in JHU by Sergey Lyskov and PyRosetta Team.
In [8]:
raw_input_pose = score.ScorePoseTask()(io.pose_from_sequence("TESTESTEST"))
input_pose = input_relax(raw_input_pose)

Perform exhaustive point mutation and pack

In [9]:
def mutate_residue(input_pose, res_index, new_aa, res_label = None):
    import pyrosetta.rosetta.core.pose as pose
    work_pose = packed_pose.to_pose(input_pose)
    # Annotate strucure with reslabel, for use in downstream protocol
    # Add parameters as score, for use in downstream analysis
    if res_label:
        work_pose.pdb_info().add_reslabel(res_index, res_label)
        pose.setPoseExtraScore(work_pose, "mutation_index", res_index)
        pose.setPoseExtraScore(work_pose, "mutation_aa", new_aa)
    if len(new_aa) == 1:
        new_aa = str.upper(Bio.SeqUtils.seq3(new_aa))
    assert new_aa in map(str.upper, IUPACData.protein_letters_3to1)
    protocol = """
        <MutateResidue name="mutate" new_res="%(new_aa)s" target="%(res_index)i" />
        <Add mover_name="mutate"/>
    """ % locals()
    return rosetta_scripts.SingleoutputRosettaScriptsTask(protocol)(work_pose)
In [10]:
refine = """

    <ResiduePDBInfoHasLabel name="mutation" property="mutation" />
    <Not name="not_neighbor">
      <Neighborhood selector="mutation" distance="12.0" />
    <RestrictToRepacking name="only_pack"/>
    <OperateOnResidueSubset name="only_repack_neighbors" selector="not_neighbor" >

    <PackRotamersMover name="pack_area" task_operations="only_pack,only_repack_neighbors" />
    <Add mover="pack_area"/>
refine_mutation = rosetta_scripts.SingleoutputRosettaScriptsTask(refine)

Mutation and pack

Job distribution via multiprocessing

In [11]:
from multiprocessing import Pool
import itertools
with pyrosetta.distributed.utility.log.LoggingContext(logging.getLogger("rosetta"), level=logging.WARN):
    with Pool() as p:
        work = [
            (input_pose, i, aa, "mutation")
            for i, aa in itertools.product(range(1, len(packed_pose.to_pose(input_pose).residues) + 1), IUPACData.protein_letters)
        mutations = p.starmap(mutate_residue, work)

Job distribution via dask

In [12]:
if not os.getenv("DEBUG"):
    import dask.distributed
    cluster = dask.distributed.LocalCluster(n_workers=1, threads_per_worker=1)
    client = dask.distributed.Client(cluster)

    refinement_tasks = [client.submit(refine_mutation, mutant) for mutant in mutations]"refining")
    refinements = [task.result() for task in refinement_tasks]


Analysis of delta score

In [13]:
if not os.getenv("DEBUG"):
    result_frame = pandas.DataFrame.from_records(packed_pose.to_dict(refinements))
    result_frame["delta_total_score"] = result_frame["total_score"] - input_pose.scores["total_score"] 
    result_frame["mutation_index"] = list(map(int, result_frame["mutation_index"]))
In [14]:
if not os.getenv("DEBUG"):
    matplotlib.rcParams['figure.figsize'] = [24.0, 8.0]
        result_frame.pivot("mutation_aa", "mutation_index", "delta_total_score"),
        cmap="RdBu_r", center=0, vmax=50)
<matplotlib.axes._subplots.AxesSubplot at 0x155f7ca90>
In [ ]: