< Side-Chain Packing | Contents | Index | Docking >
Keywords: generate_resfile_from_pdb(), generate_resfile_from_pose(), create_packer_task(), mutate_residue()
# Notebook setup
import sys
if 'google.colab' in sys.modules:
!pip install pyrosettacolabsetup
import pyrosettacolabsetup
pyrosettacolabsetup.setup()
print ("Notebook is set for PyRosetta use in Colab. Have fun!")
Make sure you are in the directory with the pdb files:
cd google_drive/My\ Drive/student-notebooks/
# From previous section:
from pyrosetta import *
from pyrosetta.teaching import *
pyrosetta.init()
pose = pose_from_pdb("inputs/1YY8.clean.pdb")
start_pose = Pose()
start_pose.assign(pose)
scorefxn = get_fa_scorefxn()
core.init: Checking for fconfig files in pwd and ./rosetta/flags core.init: Rosetta version: PyRosetta4.Release.python36.mac r208 2019.04+release.fd666910a5e fd666910a5edac957383b32b3b4c9d10020f34c1 http://www.pyrosetta.org 2019-01-22T15:55:37 core.init: command: PyRosetta -ex1 -ex2aro -database /Users/kathyle/Computational Protein Prediction and Design/PyRosetta4.Release.python36.mac.release-208/pyrosetta/database core.init: 'RNG device' seed mode, using '/dev/urandom', seed=114166398 seed_offset=0 real_seed=114166398 core.init.random: RandomGenerator:init: Normal mode, seed=114166398 RG_type=mt19937
/Users/kathyle/Computational Protein Prediction and Design/PyRosetta4.Release.python36.mac.release-208/pyrosetta/teaching.py:13: UserWarning: Import of 'rosetta' as a top-level module is deprecated and may be removed in 2018, import via 'pyrosetta.rosetta'. from rosetta.core.scoring import *
core.chemical.GlobalResidueTypeSet: Finished initializing fa_standard residue type set. Created 696 residue types core.chemical.GlobalResidueTypeSet: Total time to initialize 1.06092 seconds. core.import_pose.import_pose: File '1YY8.clean.pdb' automatically determined to be of type PDB core.conformation.Conformation: [ WARNING ] missing heavyatom: CG on residue ARG 18 core.conformation.Conformation: [ WARNING ] missing heavyatom: CD on residue ARG 18 core.conformation.Conformation: [ WARNING ] missing heavyatom: NE on residue ARG 18 core.conformation.Conformation: [ WARNING ] missing heavyatom: CZ on residue ARG 18 core.conformation.Conformation: [ WARNING ] missing heavyatom: NH1 on residue ARG 18 core.conformation.Conformation: [ WARNING ] missing heavyatom: NH2 on residue ARG 18 core.conformation.Conformation: [ WARNING ] missing heavyatom: CG on residue GLN:NtermProteinFull 214 core.conformation.Conformation: [ WARNING ] missing heavyatom: CD on residue GLN:NtermProteinFull 214 core.conformation.Conformation: [ WARNING ] missing heavyatom: OE1 on residue GLN:NtermProteinFull 214 core.conformation.Conformation: [ WARNING ] missing heavyatom: NE2 on residue GLN:NtermProteinFull 214 core.conformation.Conformation: Found disulfide between residues 23 88 core.conformation.Conformation: current variant for 23 CYS core.conformation.Conformation: current variant for 88 CYS core.conformation.Conformation: current variant for 23 CYD core.conformation.Conformation: current variant for 88 CYD core.conformation.Conformation: Found disulfide between residues 134 194 core.conformation.Conformation: current variant for 134 CYS core.conformation.Conformation: current variant for 194 CYS core.conformation.Conformation: current variant for 134 CYD core.conformation.Conformation: current variant for 194 CYD core.conformation.Conformation: Found disulfide between residues 235 308 core.conformation.Conformation: current variant for 235 CYS core.conformation.Conformation: current variant for 308 CYS core.conformation.Conformation: current variant for 235 CYD core.conformation.Conformation: current variant for 308 CYD core.conformation.Conformation: Found disulfide between residues 359 415 core.conformation.Conformation: current variant for 359 CYS core.conformation.Conformation: current variant for 415 CYS core.conformation.Conformation: current variant for 359 CYD core.conformation.Conformation: current variant for 415 CYD core.pack.pack_missing_sidechains: packing residue number 18 because of missing atom number 6 atom name CG core.pack.pack_missing_sidechains: packing residue number 214 because of missing atom number 6 atom name CG core.pack.task: Packer task: initialize from command line() core.scoring.ScoreFunctionFactory: SCOREFUNCTION: ref2015 core.scoring.etable: Starting energy table calculation core.scoring.etable: smooth_etable: changing atr/rep split to bottom of energy well core.scoring.etable: smooth_etable: spline smoothing lj etables (maxdis = 6) core.scoring.etable: smooth_etable: spline smoothing solvation etables (max_dis = 6) core.scoring.etable: Finished calculating energy tables. basic.io.database: Database file opened: scoring/score_functions/hbonds/ref2015_params/HBPoly1D.csv basic.io.database: Database file opened: scoring/score_functions/hbonds/ref2015_params/HBFadeIntervals.csv basic.io.database: Database file opened: scoring/score_functions/hbonds/ref2015_params/HBEval.csv basic.io.database: Database file opened: scoring/score_functions/hbonds/ref2015_params/DonStrength.csv basic.io.database: Database file opened: scoring/score_functions/hbonds/ref2015_params/AccStrength.csv basic.io.database: Database file opened: scoring/score_functions/rama/fd/all.ramaProb basic.io.database: Database file opened: scoring/score_functions/rama/fd/prepro.ramaProb basic.io.database: Database file opened: scoring/score_functions/omega/omega_ppdep.all.txt basic.io.database: Database file opened: scoring/score_functions/omega/omega_ppdep.gly.txt basic.io.database: Database file opened: scoring/score_functions/omega/omega_ppdep.pro.txt basic.io.database: Database file opened: scoring/score_functions/omega/omega_ppdep.valile.txt basic.io.database: Database file opened: scoring/score_functions/P_AA_pp/P_AA basic.io.database: Database file opened: scoring/score_functions/P_AA_pp/P_AA_n core.scoring.P_AA: shapovalov_lib::shap_p_aa_pp_smooth_level of 1( aka low_smooth ) got activated. basic.io.database: Database file opened: scoring/score_functions/P_AA_pp/shapovalov/10deg/kappa131/a20.prop basic.io.database: Database file opened: scoring/score_functions/elec_cp_reps.dat core.scoring.elec.util: Read 40 countpair representative atoms core.pack.dunbrack.RotamerLibrary: shapovalov_lib_fixes_enable option is true. core.pack.dunbrack.RotamerLibrary: shapovalov_lib::shap_dun10_smooth_level of 1( aka lowest_smooth ) got activated. core.pack.dunbrack.RotamerLibrary: Binary rotamer library selected: /Users/kathyle/Computational Protein Prediction and Design/PyRosetta4.Release.python36.mac.release-208/pyrosetta/database/rotamer/shapovalov/StpDwn_0-0-0/Dunbrack10.lib.bin core.pack.dunbrack.RotamerLibrary: Using Dunbrack library binary file '/Users/kathyle/Computational Protein Prediction and Design/PyRosetta4.Release.python36.mac.release-208/pyrosetta/database/rotamer/shapovalov/StpDwn_0-0-0/Dunbrack10.lib.bin'. core.pack.dunbrack.RotamerLibrary: Dunbrack 2010 library took 0.428411 seconds to load from binary core.pack.pack_rotamers: built 43 rotamers at 2 positions. core.pack.interaction_graph.interaction_graph_factory: Instantiating DensePDInteractionGraph core.scoring.ScoreFunctionFactory: SCOREFUNCTION: ref2015
Design calculations can be accomplished simply by packing side chains with a rotamer set that includes all amino acid types. That is, when the Monte Carlo routine swaps rotamers, it could replace the existing side chain with another conformation of the same residue or some conformation of a different residue type. Trial mutations are accepted or rejected with the Metropolis criterion, and the standard full-atom energy function is supplemented by a reference energy term, ref
, which represents the relative energies of each residue type in an unfolded peptide.
Design operations are easiest to specify through a data file called a “resfile.” You can create a resfile for a given pdb file or pose using the following toolbox methods:
from pyrosetta.toolbox import generate_resfile_from_pdb
generate_resfile_from_pdb("1YY8.clean.pdb", "1YY8.resfile")
from pyrosetta.toolbox import generate_resfile_from_pose
generate_resfile_from_pose(pose, "1YY8.resfile")
### BEGIN SOLUTION
from pyrosetta.toolbox import generate_resfile_from_pdb
generate_resfile_from_pdb("inputs/1YY8.clean.pdb", "1YY8.resfile")
from pyrosetta.toolbox import generate_resfile_from_pose
generate_resfile_from_pose(pose, "1YY8.resfile")
### END SOLUTION
core.import_pose.import_pose: File '1YY8.clean.pdb' automatically determined to be of type PDB core.conformation.Conformation: [ WARNING ] missing heavyatom: CG on residue ARG 18 core.conformation.Conformation: [ WARNING ] missing heavyatom: CD on residue ARG 18 core.conformation.Conformation: [ WARNING ] missing heavyatom: NE on residue ARG 18 core.conformation.Conformation: [ WARNING ] missing heavyatom: CZ on residue ARG 18 core.conformation.Conformation: [ WARNING ] missing heavyatom: NH1 on residue ARG 18 core.conformation.Conformation: [ WARNING ] missing heavyatom: NH2 on residue ARG 18 core.conformation.Conformation: [ WARNING ] missing heavyatom: CG on residue GLN:NtermProteinFull 214 core.conformation.Conformation: [ WARNING ] missing heavyatom: CD on residue GLN:NtermProteinFull 214 core.conformation.Conformation: [ WARNING ] missing heavyatom: OE1 on residue GLN:NtermProteinFull 214 core.conformation.Conformation: [ WARNING ] missing heavyatom: NE2 on residue GLN:NtermProteinFull 214 core.conformation.Conformation: Found disulfide between residues 23 88 core.conformation.Conformation: current variant for 23 CYS core.conformation.Conformation: current variant for 88 CYS core.conformation.Conformation: current variant for 23 CYD core.conformation.Conformation: current variant for 88 CYD core.conformation.Conformation: Found disulfide between residues 134 194 core.conformation.Conformation: current variant for 134 CYS core.conformation.Conformation: current variant for 194 CYS core.conformation.Conformation: current variant for 134 CYD core.conformation.Conformation: current variant for 194 CYD core.conformation.Conformation: Found disulfide between residues 235 308 core.conformation.Conformation: current variant for 235 CYS core.conformation.Conformation: current variant for 308 CYS core.conformation.Conformation: current variant for 235 CYD core.conformation.Conformation: current variant for 308 CYD core.conformation.Conformation: Found disulfide between residues 359 415 core.conformation.Conformation: current variant for 359 CYS core.conformation.Conformation: current variant for 415 CYS core.conformation.Conformation: current variant for 359 CYD core.conformation.Conformation: current variant for 415 CYD core.pack.pack_missing_sidechains: packing residue number 18 because of missing atom number 6 atom name CG core.pack.pack_missing_sidechains: packing residue number 214 because of missing atom number 6 atom name CG core.pack.task: Packer task: initialize from command line() core.scoring.ScoreFunctionFactory: SCOREFUNCTION: ref2015 core.pack.pack_rotamers: built 43 rotamers at 2 positions. core.pack.interaction_graph.interaction_graph_factory: Instantiating DensePDInteractionGraph
Inside the resfile you will see:
NATAA
USE_INPUT_SC
start
This means that all of the native amino acids will remain the same, but we will allow repacking to other rotamers. You can change "NATAA" to any of the phrases in the table below.
Also, under "start", you can add exceptions for certain amino acids. Let's do an example.
Name | Definition |
---|---|
NATRO | use native amino acid and native rotamer (does not repack) |
NATAA | use native amino acid but allow repacking to other rotamers |
PIKAA ILV | use only the following amino acids (ILV) and allow repacking between them |
ALLAA | use all amino acids and all repacking |
Edit the resfile to force residue 49 to be glutamic acid (49 A PIKAA E
) and ensure all other residues cannot be redesigned (change NATAA
to NATRO
). Save the file as 1YY8-K49E.resfile
.
Your resfile should look like this:
NATRO
USE_INPUT_SC
start
49 A PIKAA E
Create a new task for design from the resfile:
from pyrosetta.rosetta.core.pack.task import TaskFactory, parse_resfile
task_design = TaskFactory.create_packer_task(pose)
parse_resfile(pose, task_design, "1YY8-K49E.resfile")
### BEGIN SOLUTION
from pyrosetta.rosetta.core.pack.task import TaskFactory, parse_resfile
task_design = TaskFactory.create_packer_task(pose)
parse_resfile(pose, task_design, "expected_outputs/1YY8-K49E.resfile")
### END SOLUTION
Score the original start_pose
conformation from the 1YY8 pdb for reference. Create a new PackResiduesMover
with the task_design
and use it to mutate residue 49 to glutamic acid. Confirm you mutated the residue by printing residue 49.
Question: What is the predicted ΔG of the mutation? Is this a stabilizing mutation?
pose.assign(start_pose)
pack_mover = PackRotamersMover(scorefxn, task_design)
print(pose.residue(49))
pack_mover.apply(pose)
print(pose.residue(49))
print(scorefxn(pose) - scorefxn(start_pose))
### BEGIN SOLUTION
pose.assign(start_pose)
pack_mover = PackRotamersMover(scorefxn, task_design)
print(pose.residue(49))
pack_mover.apply(pose)
print(pose.residue(49))
print(scorefxn(pose) - scorefxn(start_pose))
### END SOLUTION
Residue 49: LYS (LYS, K): Base: LYS Properties: POLYMER PROTEIN CANONICAL_AA POLAR CHARGED POSITIVE_CHARGE METALBINDING SIDECHAIN_AMINE ALPHA_AA L_AA Variant types: Main-chain atoms: N CA C Backbone atoms: N CA C O H HA Side-chain atoms: CB CG CD CE NZ 1HB 2HB 1HG 2HG 1HD 2HD 1HE 2HE 1HZ 2HZ 3HZ Atom Coordinates: N : 32.097, 27.128, 7.923 CA : 31.663, 28.176, 7.007 C : 30.939, 27.471, 5.878 O : 31.191, 26.303, 5.597 CB : 32.852, 28.971, 6.449 CG : 33.743, 28.165, 5.512 CD : 35.058, 28.866, 5.167 CE : 35.795, 28.002, 4.134 NZ : 37.148, 28.45, 3.923 H : 32.6902, 26.3872, 7.57742 HA : 31.0202, 28.8679, 7.55234 1HB : 32.4846, 29.8416, 5.90508 2HB : 33.4654, 29.335, 7.27346 1HG : 33.9859, 27.2076, 5.97453 2HG : 33.2119, 27.9736, 4.58014 1HD : 34.8472, 29.8571, 4.76287 2HD : 35.6568, 28.9812, 6.07042 1HE : 35.8169, 26.9678, 4.47438 2HE : 35.2623, 28.0373, 3.18377 1HZ : 37.5965, 27.8576, 3.23856 2HZ : 37.1393, 29.4035, 3.58859 3HZ : 37.6581, 28.4036, 4.79339 Mirrored relative to coordinates in ResidueType: FALSE core.pack.pack_rotamers: built 6 rotamers at 1 positions. core.pack.interaction_graph.interaction_graph_factory: Instantiating PDInteractionGraph Residue 49: GLU (GLU, E): Base: GLU Properties: POLYMER PROTEIN CANONICAL_AA SC_ORBITALS POLAR CHARGED NEGATIVE_CHARGE METALBINDING ALPHA_AA L_AA Variant types: Main-chain atoms: N CA C Backbone atoms: N CA C O H HA Side-chain atoms: CB CG CD OE1 OE2 1HB 2HB 1HG 2HG Atom Coordinates: N : 32.097, 27.128, 7.923 CA : 31.663, 28.176, 7.007 C : 30.939, 27.471, 5.878 O : 31.191, 26.303, 5.597 CB : 32.841, 28.9963, 6.4766 CG : 33.8415, 28.199, 5.65185 CD : 34.9945, 29.0322, 5.16552 OE1: 35.322, 29.9961, 5.8151 OE2: 35.5487, 28.7043, 4.1428 H : 32.6902, 26.3872, 7.57742 HA : 31.0202, 28.8679, 7.55234 1HB : 32.4672, 29.8096, 5.85432 2HB : 33.3783, 29.4442, 7.31266 1HG : 34.2295, 27.3826, 6.26034 2HG : 33.3267, 27.7648, 4.79599 Mirrored relative to coordinates in ResidueType: FALSE 5.00381001783191
Note the residue reference energy term (ref
) in the scoring function.
Question: What is this value before and after you mutated the residue? What does this energy represent?
Create a new resfile that allows residue 49 to be designed freely (49 A ALLAA
) and call it 1YY8-K49All.resfile
.
Create a new PackerTask
and PackRotamersMover
and apply it.
Question: What residue does Rosetta choose? Why?
pose.assign(start_pose)
task_design_all = TaskFactory.create_packer_task(pose)
parse_resfile(pose, task_design_all, "1YY8-K49All.resfile")
pack_mover_all = PackRotamersMover(scorefxn, task_design_all)
pack_mover_all.apply(pose)
print(pose.residue(49))
### BEGIN SOLUTION
pose.assign(start_pose)
task_design_all = TaskFactory.create_packer_task(pose)
parse_resfile(pose, task_design_all, "expected_outputs/1YY8-K49All.resfile")
pack_mover_all = PackRotamersMover(scorefxn, task_design_all)
pack_mover_all.apply(pose)
print(pose.residue(49))
### END SOLUTION
core.pack.pack_rotamers: built 96 rotamers at 1 positions. core.pack.interaction_graph.interaction_graph_factory: Instantiating PDInteractionGraph Residue 49: LYS (LYS, K): Base: LYS Properties: POLYMER PROTEIN CANONICAL_AA POLAR CHARGED POSITIVE_CHARGE METALBINDING SIDECHAIN_AMINE ALPHA_AA L_AA Variant types: Main-chain atoms: N CA C Backbone atoms: N CA C O H HA Side-chain atoms: CB CG CD CE NZ 1HB 2HB 1HG 2HG 1HD 2HD 1HE 2HE 1HZ 2HZ 3HZ Atom Coordinates: N : 32.097, 27.128, 7.923 CA : 31.663, 28.176, 7.007 C : 30.939, 27.471, 5.878 O : 31.191, 26.303, 5.597 CB : 32.8374, 29.0038, 6.48277 CG : 33.8145, 28.2273, 5.61012 CD : 34.9712, 29.1063, 5.15855 CE : 36.0034, 28.3064, 4.37754 NZ : 35.4575, 27.8019, 3.08838 H : 32.6902, 26.3872, 7.57742 HA : 31.0202, 28.8679, 7.55234 1HB : 32.4591, 29.8421, 5.89706 2HB : 33.3952, 29.4171, 7.32334 1HG : 34.2106, 27.3804, 6.17181 2HG : 33.2943, 27.8464, 4.73181 1HD : 34.5922, 29.9104, 4.52621 2HD : 35.453, 29.5497, 6.02994 1HE : 36.869, 28.9339, 4.17067 2HE : 36.3331, 27.4565, 4.97492 1HZ : 36.171, 27.2776, 2.60211 2HZ : 34.6642, 27.2026, 3.26944 3HZ : 35.1659, 28.5822, 2.51737 Mirrored relative to coordinates in ResidueType: FALSE
Create your own resfile that will restrict residue 49 to only negatively charged residues using the resfile line 49 A PIKAA DE
and re-apply the design mover.
Question: Now what residue is chosen? What is the new residue energy, and why (physically) is it less favorable than the last design?
Let’s try to make this design more favorable. Select several surrounding residues for design, and set them also to enable mutations to any residue. Call the design mover again.
Question: Now what do you find?
It should be noted that PyRosetta includes a handy toolbox method mutate_residue() that will change a specified residue in a given pose into another. However, the rotamer of this new residue will not be optimized. For example:
from pyrosetta.toolbox import mutate_residue
pose.assign(start_pose)
print(pose.residue(49))
mutate_residue(pose, 49, 'E')
print(pose.residue(49))
Refinement and discrimination. Download the “single misfold” decoy set from the Decoys ’R Us repository at dd.compbio.washington.edu/ddownload.cgi?misfold. (Documentation for this project is at dd.compbio.washington.edu.) This repository has a single “correct” and “incorrect” predicted structure for several proteins. For this exercise, analyze pdbs 2CI2 and 2CRO; each has two “incorrect” structures offered. (Technical note: These decoys have an empty occupancy field in the PDB ATOM records; a value of 1 needs to be added before Rosetta will load these structures.)
Write a program that will calculate and output the score for each decoy (i) as is from the PDB file, (ii) after packing only, (iii) after minimization only, and (iv) after packing and minimizing. For each of the four cases, compare the scores of the “correct” structure with those of the “incorrect” structure. Which schemes successfully discriminate the correct structures?
Write a refinement protocol that will iterate between side-chain packing, small and shear moves, and minimization. Where is the best place to position the Monte Carlo acceptance test? Test your protocol by making 10 independently-refined structures for the correct and incorrect decoys of 2CRO from the Decoys ’R Us single misfold set. Is this protocol able to discriminate the correct decoy? Submit your code.
HIV-1 protease is a major drug target for antiretroviral therapies. Protease inhibitors are designed from substrate peptide mimics. We will attempt to take a natural substrate peptide of HIV-1 protease and design it for improved binding — potentially to serve as a good template for drug design. Use PDB file 1KJG for the following analysis.
Turn on side-chain packing for the protease active site (residues 8, 23, 25, 29, 30, 32, 45, 47, 50, 53, 82, and 84 of both chains A and B) and for the peptide (residues 2–9 on chain P; all of these numbers follow the PDB numbering).
Repack the above side chains and then energy minimize those same side chains with the backbone fixed. Generate 10 decoys and record the energies for each decoy. This will represent the reference state: the wild-type peptide bound to the protease.
For residue 2 of the peptide (chain P), allow repacking to any of the 20 amino acid residues, while leaving the packing and side-chain minimization the same as in step b. Generate 10 decoys and record the energies. These will represent single mutants at that residue position.
Repeat step c for each of the other 8 residues in the substrate peptide.
Take the lowest energy for each mutation position and compare it to the lowest energy for the wild type. Do single mutants at any of these positions improve the energy over the wild type? Which ones? By how much? Which energy components are mostly responsible?
Which peptide residue positions are easiest to improve? Which positions are the hardest?
Are there any other trends? Hydrophobic vs. polar, bulky residues vs. small residues, etc.?
Altman et al. (Proteins 2008) found, using their own computational design algorithm, that the most favorable sequences were a triple mutant E3D/T4I/V6L, a single mutant T4V, and a single mutant E3Q. How do their results compare with yours?
Natural substrates are often sub-optimal binders. Why would this be advantageous?
Effect of backbone conformation on design. HIV-1 protease is promiscuous, meaning it can cleave a wide range of peptides beyond the ten natural substrates of the virus. Let’s examine the preferences of the enzyme through Rosetta design calculations.
Download HIV-1 protease in complex with CA-P2 peptide (1F7A). Select the eight peptide residues for unrestricted design and let Rosetta redesign the substrate sequence. What is the new sequence and how does it compare to the original? What percent of the original sequence was optimal for its structure?
Download HIV-1 protease in complex with RT-RH peptide (1KJG). (Note that the enzyme is the same here, but it is crystallized with a different substrate.) Again, design the eight substrate residues with Rosetta. What percent of this substrate sequence is optimal for this crystal structure? ____%
How do the designed sequences of (a) and (b) compare? Why should they be the same? Why would they not be the same? What are the implications for the field of computational protein design?
Write a program which iterates between design of all residues of a protein and refinement via small, shear, and minimization moves.
What is the thermodynamic meaning of the ref energy term, and what does it correspond to physically? During evolution, the genome sequence may mutate to cause protein sequence changes. Alternately, one could consider the difference in evolutionary propensities for each residue type. How could you derive reference energies from sequence data, and what would that mean?
How do Kuhlman & Baker fit the reference energies in their 2000 PNAS paper?
S. C. Lovell et al., “The penultimate rotamer library,” Proteins 40, 389-408 (2000).
R. L. Dunbrack & F. E. Cohen, “Bayesian statistical analysis of protein side-chain rotamer preferences,” Protein Sci. 6, 1661-1681 (1997)
< Side-Chain Packing | Contents | Index | Docking >