#!/usr/bin/env python # coding: utf-8 # # *This notebook contains material from [PyRosetta](https://RosettaCommons.github.io/PyRosetta.notebooks); # content is available [on Github](https://github.com/RosettaCommons/PyRosetta.notebooks.git).* # # < [*De Novo* Parametric Backbone Design](http://nbviewer.jupyter.org/github/RosettaCommons/PyRosetta.notebooks/blob/master/notebooks/06.06-Introduction-to-Parametric-backbone-design.ipynb) | [Contents](toc.ipynb) | [Index](index.ipynb) | [**Point Mutation Scan**](http://nbviewer.jupyter.org/github/RosettaCommons/PyRosetta.notebooks/blob/master/notebooks/06.08-Point-Mutation-Scan.ipynb) >
# # *De Novo* Protein Design with PyRosetta
# Keywords:
#
# ## Overview
# **Make sure you are in the directory with the `.pdb` files:**
#
# `cd google_drive/MyDrive/student-notebooks/`
# *Warning*: This notebook uses `pyrosetta.distributed.viewer` code, which runs in `jupyter notebook` and might not run if you're using `jupyterlab`.
# *Note:* This Jupyter notebook requires the PyRosetta distributed layer which is obtained by building PyRosetta with the `--serialization` flag or installing PyRosetta from the RosettaCommons conda channel (for more information, visit: http://www.pyrosetta.org/dow).
# In[ ]:
get_ipython().system('pip install pyrosettacolabsetup')
import pyrosettacolabsetup; pyrosettacolabsetup.install_pyrosetta()
import pyrosetta; pyrosetta.init()
import Bio.Data.IUPACData as IUPACData
import Bio.SeqUtils
import logging
logging.basicConfig(level=logging.INFO)
import os
import pyrosetta
import pyrosetta.distributed
import pyrosetta.distributed.viewer as viewer
import site
import sys
# Initialize PyRosetta:
# In[ ]:
flags = """
-linmem_ig 10
-ignore_unrecognized_res 1
-mute core.select.residue_selector.SecondaryStructureSelector
-mute core.select.residue_selector.PrimarySequenceNeighborhoodSelector
-mute protocols.DsspMover
"""
pyrosetta.distributed.init(flags)
# Let's setup the pose of a *de novo* helical bundle from the PDB for downstream design: https://www.rcsb.org/structure/5J0J
# In[ ]:
start_pose = pyrosetta.io.pose_from_file("inputs/5J0J.clean.pdb")
pose = start_pose.clone()
# ## Design Strategy
#
# Minimize the crystal structure coordinates, then convert chain "A" to poly-alanine, and then perform one-sided protein-protein interface design designing chain A while only re-packing the homotrimer interface residues of chains B and C! Therefore, we are designing a homotrimer into a heterotrimer. Furthermore, prevent backbone torsions from minimizing and only minimize the side-chains of the homotrimer interface and all of chain A using the `FastDesign` mover! After design, repack and minimize all side-chains.
# Prior to and after design, we want to relax the protein with a scorefunction that packs the rotamers and minimizes the side-chain degrees of freedom while optimizing for realistic energies. If you were to allow backbone minimization (which you may optionally choose to), we would want use a Cartesian scorefunction (in this case, `ref2015_cart.wts`) which automatically sets `cart_bonded` scoreterm to a weight of 1.0, which helps to close chain breaks in the backbone. Similarly, you might want to also turn on the `coordinate_constraint` scoreterm to penalize deviations of the backbone coordinates from their initial coordinates during minimization. In this tutorial, we demonstrate that concept but will prevent backbone torsions from being minimized (however, feel free to turn on backbone minimization!):
# In[ ]:
relax_scorefxn = pyrosetta.create_score_function("ref2015_cart.wts")
relax_scorefxn.set_weight(pyrosetta.rosetta.core.scoring.ScoreType.coordinate_constraint, 1.0)
print("The starting pose total_score is {}".format(relax_scorefxn(start_pose)))
# For *design*, if we allowed backbone minimzation we again would turn on the `coordinate_constraint` scoreterm to penalize deviations of the backbone coordinates from their initial coordinates during minimization. Additionally, we will use "non-pairwise decomposable" scoreterms to guide the packer trajectories (i.e. fixed backbone sequence design trajectories) in favor of our hypothetical design requirements to solve our biological problem. In this tutorial, we will make use of the following additional scoreterms during design:
# - `aa_composition`: This scoring term is intended for use during design, to penalize deviations from a desired residue type composition. Applies to any amino acid composition requirements specified by the user within a ResidueSelector. This scoreterm also applies to the `AddHelixSequenceConstraints` mover which sets up ideal sequence constraints for each helix in a pose or in a selection.
# - `voids_penalty`: This scoring term is intended for use during design, to penalize buried voids or cavities and to guide the packer to design solutions in which all buried volume is filled with side-chains.
# - `aa_repeat`: This wholebody scoring term is intended for use during design, to penalize long stretches in which the same residue type repeats over and over (e.g. poly-Q sequences).
# - `buried_unsatisfied_penalty`: This scoring term is intended for use during design, to provide a penalty for buried hydrogen bond donors or acceptors that are unsatisfied.
# - `netcharge`: This scoring term is intended for use during design, to penalize deviations from a desired net charge in a pose or in a selection.
# - `hbnet`: This scoring term is intended for use during design, to provide a bonus for hydrogen bond network formation.
# In[ ]:
design_scorefxn = pyrosetta.create_score_function("ref2015_cart.wts")
design_scorefxn.set_weight(pyrosetta.rosetta.core.scoring.ScoreType.coordinate_constraint, 10.0)
design_scorefxn.set_weight(pyrosetta.rosetta.core.scoring.ScoreType.aa_composition, 1.0)
design_scorefxn.set_weight(pyrosetta.rosetta.core.scoring.ScoreType.voids_penalty, 0.25)
design_scorefxn.set_weight(pyrosetta.rosetta.core.scoring.ScoreType.aa_repeat, 1.0)
design_scorefxn.set_weight(pyrosetta.rosetta.core.scoring.ScoreType.buried_unsatisfied_penalty, 1.0)
design_scorefxn.set_weight(pyrosetta.rosetta.core.scoring.ScoreType.hbnet, 1.0)
design_scorefxn.set_weight(pyrosetta.rosetta.core.scoring.ScoreType.netcharge, 1.0)
print("The starting pose total_score is {}".format(design_scorefxn(start_pose)))
# By using the `relax_scorefxn` before and after the `design_scorefxn`, we ensure that these "non-pairwise decomposable" scoreterms are not forcing unrealistic rotamers that would otherwise not be held in place without these additional scoreterms.
# Prior to any deviation from crystal structure coordinates, apply coordinate constraints to all of the backbone heavy atoms:
# In[ ]:
true_selector = pyrosetta.rosetta.core.select.residue_selector.TrueResidueSelector() # Select all residues
# Apply a virtual root onto the pose to prevent large lever-arm effects while minimizing with coordinate constraints
virtual_root = pyrosetta.rosetta.protocols.simple_moves.VirtualRootMover()
virtual_root.set_removable(True)
virtual_root.set_remove(False)
virtual_root.apply(pose)
# Construct the CoordinateConstraintGenerator
coord_constraint_gen = pyrosetta.rosetta.protocols.constraint_generator.CoordinateConstraintGenerator()
coord_constraint_gen.set_id("contrain_all_backbone_atoms!")
coord_constraint_gen.set_ambiguous_hnq(False)
coord_constraint_gen.set_bounded(False)
coord_constraint_gen.set_sidechain(False)
coord_constraint_gen.set_sd(1.0) # Sets a standard deviation of contrained atoms to (an arbitrary) 1.0 Angstroms RMSD. Set higher or lower for different results.
coord_constraint_gen.set_ca_only(False)
coord_constraint_gen.set_residue_selector(true_selector)
# Apply the CoordinateConstraintGenerator using the AddConstraints mover
add_constraints = pyrosetta.rosetta.protocols.constraint_generator.AddConstraints()
add_constraints.add_generator(coord_constraint_gen)
add_constraints.apply(pose)
# Prior to design, minimize with the `FastRelax` mover to optimize the pose within the `relax_scorefxn` scorefunction. Note: this takes ~1min 10s
# In[ ]:
tf = pyrosetta.rosetta.core.pack.task.TaskFactory()
tf.push_back(pyrosetta.rosetta.core.pack.task.operation.InitializeFromCommandline())
tf.push_back(pyrosetta.rosetta.core.pack.task.operation.IncludeCurrent())
tf.push_back(pyrosetta.rosetta.core.pack.task.operation.NoRepackDisulfides())
tf.push_back(pyrosetta.rosetta.core.pack.task.operation.OperateOnResidueSubset(
pyrosetta.rosetta.core.pack.task.operation.PreventRepackingRLT(), true_selector)) # Set to RestrictToRepackingRLT for slower/better results
mm = pyrosetta.rosetta.core.kinematics.MoveMap()
mm.set_bb(False) # Set to true if desired
mm.set_chi(True)
mm.set_jump(False)
fast_relax = pyrosetta.rosetta.protocols.relax.FastRelax(scorefxn_in=relax_scorefxn, standard_repeats=1) # 2-5 repeats suggested for real applications
fast_relax.cartesian(True)
fast_relax.set_task_factory(tf)
fast_relax.set_movemap(mm)
fast_relax.minimize_bond_angles(True)
fast_relax.minimize_bond_lengths(True)
fast_relax.min_type("lbfgs_armijo_nonmonotone")
fast_relax.ramp_down_constraints(False)
if not os.getenv("DEBUG"):
get_ipython().run_line_magic('time', 'fast_relax.apply(pose)')
# Optionally, instead of running this you could reload the saved pose from a previously run trajectory:
#pose = pyrosetta.pose_from_file("minimized_start_pose.pdb")
# Let's check the delta `total_score` per residue after minimizing in the `relax_scorefxn` scorefunction:
# In[ ]:
if not os.getenv("DEBUG"):
initial_score_res = relax_scorefxn(start_pose)/start_pose.size()
final_score_res = relax_scorefxn(pose)/pose.size()
delta_total_score_res = final_score_res - initial_score_res
print("{0} kcal/(mol*res) - {1} kcal/(mol*res) = {2} kcal/(mol*res)".format(final_score_res, initial_score_res, delta_total_score_res))
# We can see that the crystal structure coordinates were not quite optimal according to the `relax_scorefxn` scorefunction. So which model is correct?
#
# By how many Angstroms RMSD did the backbone Cα atoms move?
# In[ ]:
YOUR-CODE-HERE
# For downstream analysis, we want to save the pose:
# In[ ]:
minimized_start_pose = pose.clone()
pyrosetta.dump_pdb(minimized_start_pose, "outputs/minimized_start_pose.pdb")
# ## *De Novo* Protein Design
#
# Prior to designing chain A, first let's make chain A poly-alanine so that we can re-design the sidechains onto the backbone:
# In[ ]:
# Since we will do direct pose manipulation, first remove the constraints
remove_constraints = pyrosetta.rosetta.protocols.constraint_generator.RemoveConstraints()
remove_constraints.add_generator(coord_constraint_gen)
remove_constraints.apply(pose)
# Obtain chain A and convert it to poly-alanine
# In[ ]:
keep_chA = pyrosetta.rosetta.protocols.grafting.simple_movers.KeepRegionMover(res_start=str(pose.chain_begin(1)), res_end=str(pose.chain_end(1)))
keep_chA.apply(pose)
polyA_chA = pyrosetta.rosetta.protocols.pose_creation.MakePolyXMover(aa="ALA", keep_pro=0, keep_gly=0, keep_disulfide_cys=0)
polyA_chA.apply(pose)
# Obtain chains B and C
# In[ ]:
pose_chBC = minimized_start_pose.clone()
keep_chBC = pyrosetta.rosetta.protocols.grafting.simple_movers.KeepRegionMover(res_start=str(pose_chBC.chain_begin(2)), res_end=str(pose_chBC.chain_end(3)-1))
keep_chBC.apply(pose_chBC)
# Append chains B and C onto the poly-alanine version of chain A
# In[ ]:
pyrosetta.rosetta.core.pose.append_pose_to_pose(pose1=pose, pose2=pose_chBC, new_chain=True)
# Pose is now considered to have only 2 chains.
# In[ ]:
pose.num_chains()
# Let's re-establish that there are 3 chains.
# In[ ]:
switch_chains = pyrosetta.rosetta.protocols.simple_moves.SwitchChainOrderMover()
switch_chains.chain_order("12")
switch_chains.apply(pose)
print(pose.pdb_info())
print("Now the number of chains = {}".format(pose.num_chains()))
# Re-apply backbone coordinate constraints:
# In[ ]:
virtual_root.apply(pose)
add_constraints.apply(pose)
# Have a look at the new pose, chain A in which is ready to be designed!
# Next, we need to apply certain movers with our design specifications to activate certain non-pairwise decomposable scoreterms in the `design_scorefxn` scorefunction, that will be implemented when we run the `FastDesign` mover (or any downstream mover that calls the packer).
#
# We will frequently be using the following residue selectors:
# In[ ]:
chain_A = pyrosetta.rosetta.core.select.residue_selector.ChainSelector("A")
chain_BC = pyrosetta.rosetta.core.select.residue_selector.NotResidueSelector(chain_A)
# Apply the `AddCompositionConstraintMover` mover, which utilizes the `aa_composition` scoreterm. Applying the following mover to the pose imposes the design constraints that the ResidueSelector have 40% aliphatic or aromatic residues other than leucine (i.e. ALA, PHE, ILE, MET, PRO, VAL, TRP, or TYR), and 5% leucines. For documentation, see: https://www.rosettacommons.org/docs/latest/rosetta_basics/scoring/AACompositionEnergy
# In[ ]:
add_composition_constraint = pyrosetta.rosetta.protocols.aa_composition.AddCompositionConstraintMover()
add_composition_constraint.create_constraint_from_file_contents("""
PENALTY_DEFINITION
OR_PROPERTIES AROMATIC ALIPHATIC
NOT_TYPE LEU
FRACT_DELTA_START -0.05
FRACT_DELTA_END 0.05
PENALTIES 1 0 1 # The above two lines mean that if we're 5% below or 5% above the desired content, we get a 1-point penalty.
FRACTION 0.4 # Forty percent aromatic or aliphatic, but not leucine
BEFORE_FUNCTION CONSTANT
AFTER_FUNCTION CONSTANT
END_PENALTY_DEFINITION
PENALTY_DEFINITION
TYPE LEU
DELTA_START -1
DELTA_END 1
PENALTIES 1 0 1
FRACTION 0.05 # Five percent leucine
BEFORE_FUNCTION CONSTANT
AFTER_FUNCTION CONSTANT
END_PENALTY_DEFINITION
""")
add_composition_constraint.add_residue_selector(chain_A)
add_composition_constraint.apply(pose)
# Apply the `AddHelixSequenceConstraints` mover, which utilizes the `aa_composition` scoreterm. By default, this mover adds five types of sequence constraints to the designable residues in each alpha helix in the pose. Any of these behaviours may be disabled or modified by invoking advanced options, but no advanced options need be set in most cases. The five types of sequence constraints are:
#
# - A strong sequence constraint requiring at least two negatively-charged residues in the first (N-terminal) three residues of each alpha-helix.
#
# - A strong sequence constraint requiring at least two positively-charged residues in the last (C-terminal) three residues of each alpha-helix.
#
# - A weak but strongly ramping sequence constraint penalizing helix-disfavoring residue types (by default, Asn, Asp, Ser, Gly, Thr, and Val) throughout each helix. (A single such residue is sometimes tolerated, but the penalty for having more than one residue in this category increases quadratically with the count of helix-disfavouring residues.)
#
# - A weak sequence constraint coaxing the helix to have 10% alanine. Because this constraint is weak, deviations from this value are tolerated, but this should prevent an excessive abundance of alanine residues.
#
# - A weak sequence constraint coaxing the helix to have at least 25% hydrophobic content. This constraint is also weak, so slightly less hydrophobic helices will be tolerated to some degree. Note that alanine is not considered to be "hydrophobic" within Rosetta.
#
# ### For documentation, see: https://www.rosettacommons.org/docs/latest/rosetta_basics/scoring/AACompositionEnergy
# In[ ]:
add_helix_sequence_constraints = pyrosetta.rosetta.protocols.aa_composition.AddHelixSequenceConstraintsMover()
add_helix_sequence_constraints.set_residue_selector(chain_A)
add_helix_sequence_constraints.apply(pose)
# Note: the `aa_repeat` scoreterm works out-of-the-box, and does not need to be applied to the pose to work, it just needs to have a weight of >0 in the scorefunction used by the packer. It imposes a penalty for each stretch of repeating amino acids, with the penalty value depending nonlinearly on the length of the repeating stretch. By default, 1- or 2-residue stretches incur no penalty, 3-residue stretches incur a penalty of +1, 4-residue stretches incur a penalty of +10, and 5-residue stretches or longer incur a penalty of +100. Since the term is sequence-based, it is really only useful for design -- that is, it will impose an identical penalty for a fixed-sequence pose, regardless its conformation. This also means that the term has no conformational derivatives: the minimizer ignores it completely. The term is not pairwise-decomposible, but has been made packer-compatible, so it can direct the sequence composition during a packer run. For documentation, see: https://www.rosettacommons.org/docs/latest/rosetta_basics/scoring/Repeat-stretch-energy
# Similarly, the `voids_penalty` scoreterm does not need to be applied to the pose to work, it just needs to have a weight of >0 in the scorefunction used by the packer. For documentation, see: https://www.rosettacommons.org/docs/latest/rosetta_basics/scoring/VoidsPenaltyEnergy
# Similarly, the `buried_unsatisfied_penalty` scoreterm does not need to be applied to the pose to work, it just needs to have a weight of >0 in the scorefunction used by the packer. For documentation, see: https://www.rosettacommons.org/docs/latest/rosetta_basics/scoring/BuriedUnsatPenalty
# Similarly, the `hbnet` scoreterm does not need to be applied to the pose to work, it just needs to have a weight of >0 (ideally between 1.0 to 10.0) in the scorefunction used by the packer. For documentation, see: https://www.rosettacommons.org/docs/latest/rosetta_basics/scoring/HBNetEnergy
# Apply the `AddNetChargeConstraintMover` mover, which utilizes the `netcharge` scoreterm. In this case, we require that the net charge in chain A must be exactly 0.
# In[ ]:
add_net_charge_constraint = pyrosetta.rosetta.protocols.aa_composition.AddNetChargeConstraintMover()
add_net_charge_constraint.create_constraint_from_file_contents("""
DESIRED_CHARGE 0 #Desired net charge is zero.
PENALTIES_CHARGE_RANGE -1 1 #Penalties are listed in the observed net charge range of -1 to +1.
PENALTIES 1 0 1 #The penalties are 1 for an observed charge of -1, 0 for an observed charge of 0, and 1 for an observed charge of +1.
BEFORE_FUNCTION QUADRATIC #Ramp quadratically for observed net charges of -2 or less.
AFTER_FUNCTION QUADRATIC #Ramp quadratically for observed net charges of +2 or greater.
""")
add_net_charge_constraint.add_residue_selector(chain_A)
add_net_charge_constraint.apply(pose)
# Specify a custom `relaxscript` that optimizes the `ramp_repack_min` weights to prevent too many alanines from being designed (demonstrated at Pre-RosettaCON 2018):
# Specify TaskOperations to be applied to chain A. In this case, let's use the latest LayerDesign implementation (Note: this is still being actively developed) using the XmlObjects class.
# In[ ]:
layer_design_task = pyrosetta.rosetta.protocols.rosetta_scripts.XmlObjects.create_from_string("""