Notebook

Before you turn this problem in, make sure everything runs as expected. First, restart the kernel (in the menubar, select Kernel$\rightarrow$Restart) and then run all cells (in the menubar, select Cell$\rightarrow$Run All).

Make sure you fill in any place that says YOUR CODE HERE or "YOUR ANSWER HERE", as well as your name and collaborators below:

In [ ]:

NAME = ""
COLLABORATORS = ""

This notebook contains material from PyRosetta; content is available on Github.

< RosettaAntibody Framework | Contents | Index | RosettaCarbohydrates >

RosettaAntibodyDesign¶

Notes¶

This tutorial will walk you through how to use RosettaAntibodyDesign in PyRosetta. You should also go through the parellel distribution workshop as you will most likely need to create many decoys for some of these design tasks. Note that we are using the XML interface to the code here for simplicity (and because I had a C++ workshop I am converting - truth be told). The code-level interface is as robust as the XML - but will require more knowledge use. You are welcome to play around with it - all functions have descriptions and all options are possible to change through code.

Grab a coffee, take a breath, and lets learn how to design some antibodies!

Citation¶

Rosetta Antibody Design (RAbD): A General Framework for Computational Antibody Design, PLOS Computational Biology, 4/27/2018

Jared Adolf-Bryfogle, Oleks Kalyuzhniy, Michael Kubitz, Brian D. Weitzner, Xiaozhen Hu, Yumiko Adachi, William R. Schief, Roland L. Dunbrack Jr.

Manual¶

The full RAbD manual can be found here: https://www.rosettacommons.org/docs/latest/application_documentation/antibody/RosettaAntibodyDesign

Overview¶

RosettaAntibodyDesign (RAbD) is a generalized framework for the design of antibodies, in which a user can easily tailor the run to their project needs. The algorithm is meant to sample the diverse sequence, structure, and binding space of an antibody-antigen complex. An app is available, and all components can be used within RosettaScripts for easy scripting of antibody design and incorporation into other Rosetta protocols.

The framework is based on rigorous bioinformatic analysis and rooted very much on our recent clustering of antibody CDR regions. It uses the North/Dunbrack CDR definition as outlined in the North/Dunbrack clustering paper. A new clustering paper will be out in the next year, and this new analysis will be incorporated into RAbD.

The supplemental methods section of the published paper has all details of the RosettaAntibodyDesign method. This manual serves to get you started running RAbD in typical use fashions.

Algorithm¶

Broadly, the RAbD protocol consists of alternating outer and inner Monte Carlo cycles. Each outer cycle consists of randomly choosing a CDR (L1, L2, etc.) from those CDRs set to design, randomly choosing a cluster and then a structure from that cluster from the database according to the input instructions, and optionally grafting that CDR's structure onto the antibody framework in place of the existing CDR (GraftDesign). The program then performs N rounds of the inner cycle, consisting of sequence design (SeqDesign) using cluster-based sequence profiles and structural constraints, energy minimization, and optional docking. Each inner cycle structurally optimizes the backbone and repacks side chains of the CDR chosen in the outer cycle as well as optional neighbors in order to optimize interactions of the CDR with the antigen and other CDRs.

Backbone dihedral angle (CircularHarmonic) constraints derived from the cluster data are applied to each CDR to limit deleterious structural perturbations. Amino acid changes are typically sampled from profiles derived for each CDR cluster in PyIgClassify. Conservative amino acid substitutions (according to the BLOSUM62 substitution matrix) may be performed when too few sequences are available to produce a profile (e.g., for H3). After each inner cycle is completed, the new sequence and structure are accepted according to the Metropolis Monte Carlo criterion. After N rounds within the inner cycle, the program returns to the outer cycle, at which point the energy of the resulting design is compared to the previous design in the outer cycle. The new design is accepted or rejected according to the Monte Carlo criterion.

If optimizing the antibody-antigen orientation during the design (dock), SiteConstraints are automatically used to keep the CDRs (paratope) facing the antigen surface. These are termed ParatopeSiteConstraints. Optionally, one can enable constraints that keep the paratope of the antibody around a target epitope (antigen binding site). These are called ParatopeEpitopeSiteConstraints as the constraints are between the paratope and the epitope. The epitope is automatically determined as the interface residues around the paratope on input into the program, however, any residue(s) can be set as the epitope to limit unwanted movement and sampling of the antibody. See the examples and options below.

More detail on the algorithm can be found in the published paper.

General Setup and Inputs¶

Antibody Design Database

This app requires the Rosetta Antibody Design Database. A database of antibodies from the original North Clustering paper is included in Rosetta and is used as the default . An updated database (which is currently updated bi-yearly) can be downloaded here: http://dunbrack2.fccc.edu/PyIgClassify/.

For C++, It should be placed in Rosetta/main/database/sampling/antibodies/. For PyRosetta, use the cmd-line option antibody_database and set it to the full path of the downloaded database within the init() function as you have done in the past. It is recommended to use this up-to-date database for production runs. For this tutorial, we will use the database within Rosetta.
Starting Structure

The protocol begins with the three-dimensional structure of an antibody-antigen complex. Designs should start with an antibody bound to a target antigen (however optimizing just the antibody without the complex is also possible). Camelid antibodies are fully supported. This structure may be an experimental structure of an existing antibody in complex with its antigen, a predicted structure of an existing antibody docked computationally to its antigen, or even the best scoring result of low-resolution docking a large number of unrelated antibodies to a desired epitope on the structure of a target antigen as a prelude to de novo design.

The program CAN computationally design an antibody to anywhere on the target protein, but it is recommended to place the antibody at the target epitope. It is beyond the scope of this program to determine potential epitopes for binding, however servers and programs exist to predict these. Automatic SiteConstraints can be used to further limit the design to target regions.
Model Numbering and Light Chain identification

The input PDB file must be renumbered to the AHo Scheme and the light chain gene must be identified. This can be done through the PyIgClassify Server.

On input into the program, Rosetta assigns our CDR clusters using the same methodology as PyIgClassify. The RosettaAntibodyDesign protocol is then driven by a set of command-line options and a set of design instructions provided as an input file that controls which CDR(s) are designed and how. Details and example command lines and instruction files are provided below.

The gene of the light chain should always be set on the command-line using the option -light_chain, these are either lamda or kappa. PyIgClassify will identify the gene of the light chain.

For this tutorial, the starting antibody is renumbered for you.
Notes for Tutorial Shortening

Always set the option, -outer_cycle_rounds to 5 in order to run these examples quickly. The default is 25. We include this in our common options file that is read in by Rosetta at the start. We will only be outputting a single structure, but typical use of the protocol is with default settings of -outer_cycle_rounds and an nstruct of at least 1000, with 5000-10000 recommended for jobs that are doing a lot of grafting. For De-novo design runs, one would want to go even higher. Note that the Docking stage increases runtime significantly as well.

The total number of rounds is outer_cycle_rounds * nstruct.
General Notes
```
	setenv PATH ${PATH}:${HOME}/rosetta_workshop/rosetta/main/source/tools
```
We will be using JSON output of the scorefile, as this is much easier to work with in python and pandas. We use the option -scorefile_format json

All of our common options for the tutorial are in the common file that you will copy to your working directory. Rosetta/PyRosetta will look for this file in your working directory or your home folder in the directory $HOME/.rosetta/flags. See this page for more info on using rosetta with custom config files: https://www.rosettacommons.org/docs/latest/rosetta_basics/running-rosetta-with-options#common-options-and-default-user-configuration

All tutorials have generated output in outputs/rabd and their approximate time to finish on a single (core i7) processor.

In [ ]:

# Notebook setup
import sys
if 'google.colab' in sys.modules:
    !pip install pyrosettacolabsetup
    import pyrosettacolabsetup
    pyrosettacolabsetup.mount_pyrosetta_install()
    print ("Notebook is set for PyRosetta use in Colab.  Have fun!")

Make sure you are in the directory with the pdb files:

cd google_drive/My\ Drive/student-notebooks/

In [ ]:

from typing import *
import pandas
from pathlib import Path
import json
import re

#Functions we will be using. I like to collect any extra functions at the top of my notebook.
def load_json_scorefile(file_path: Path, sort_by: str="dG_separated") -> pandas.DataFrame:
        """
        Read scorefile lines as a dataframe, sorted by total_score with Nan's correctly replaced.
        """
        
        local_lines = open(file_path, 'r').readlines()
        decoys=[]
        for line in local_lines:
                o = json.loads(line.replace("nan", "NaN"))
                # print o[self.decoy_field_name]
                # print repr(o)
                decoys.append(o)
        local_df = pandas.DataFrame.from_dict(decoys)
        local_df = local_df.infer_objects()
        # df.to_csv("debugging.csv", sep=",")

        local_df = local_df.sort_values(sort_by, ascending=True)
        
        return local_df

def drop_cluster_columns(local_df: pandas.DataFrame, keep_cdrs: List[str]=None) -> pandas.DataFrame:
        """
        Drop cluster columns that RAbD outputs to make it easier to work with the dataframe.
        """
        to_drop = []
        for column in local_df.columns:
            if re.search("cdr_cluster", column):
                skip=False
                if (keep_cdrs):
                    for cdr in keep_cdrs:
                        if re.search(cdr, column):
                            skip=True
                            break
                if not skip:
                    to_drop.append(column)
        return local_df.drop(columns=to_drop)

Imports¶

In [ ]:

#Python
from pyrosetta import *
from pyrosetta.rosetta import *
from pyrosetta.teaching import *
import os

#Core Includes
from rosetta.protocols.rosetta_scripts import *
from rosetta.protocols.antibody import *
from rosetta.protocols.antibody.design import *
from rosetta.utility import *

Intitlialization¶

Since we are sharing the working directory with all other notebooks, instead of using the common-configuration we spoke about in the introduction, we will be using the flags file located in the inputs directory.

In [ ]:

init('-no_fconfig @inputs/rabd/common')

In [ ]:

#Import a pose
pose = pose_from_pdb("inputs/rabd/my_ab.pdb")
original_pose = pose.clone()

Tutorial¶

Tutorial A: General Design¶

In many of these examples, we will use the xml interface to PyRosetta for simplicity with the AntibodyDesignMover - which is the actual C++ application as a mover. https://www.rosettacommons.org/docs/latest/scripting_documentation/RosettaScripts/Movers/movers_pages/antibodies/AntibodyDesignMover

Lets copy the files we need first:

	cp ../inputs/rabd/color_cdrs.pml .
	cp ../inputs/rabd/rabd.xml .

You are starting design on a new antibody that is not bound to the antigen in the crystal. This is difficult and risky, but we review how one could go about this anyway. We start by selecting a framework. Here, we use the trastuzumab framework as it expresses well, is thermodynamically stable with a Tm of 69.5 degrees, and has been shown repeatedly that it can tolerate CDRs of different sequence and structure. Note that the energy of the complex is high as we are starting from a manual placement of the antibody to antigen. If we relax the structure too much, we will fall into an energy well that is hard to escape without significant sampling.

We are using an arbitrary protein at an arbitrary site for design. The PDB of our target is 1qaw. 1qaw is an oligomer of the TRP RNA-Binding Attenuation Protein from Bacillus Stearothermophilus. It is usually a monomer/dimer, but at its multimeric interface is a tryptophan residue by itself.

It's a beautiful protein, with a cool mechanism. We will attempt to build an antibody to bind to two subunits to stabilize the dimeric state of the complex in the absence of TRP. Note that denovo design currently takes a large amount of processing power. Each tutorial below is more complex than the one before it. The examples we have for this tutorial are short runs to show HOW it can be done, but more outer_cycle_rounds and nstruct would produce far better models than the ones you will see here - as we will need to sample the relative orientation of the antibody-antigen complex through docking, the CDR clusters and lengths, the internal backbone degrees of freedom of the CDRs, as well as the sequence of the CDRs and possibly the framework. As you can tell, just the sampling problem alone is difficult. However, this will give you a basis for using RAbD on your own.

Tut A1. Sequence Design¶

Using the application is as simple as setting the -seq_design_cdrs option. This simply designs the CDRs of the heavy chain using cdr profiles if they exist for those clusters during flexible-backbone design. If the clusters do not exist (as is the case for H3 at the moment), we use conservative design by default. Note that InterfaceAnalyzer is run on each output decoy in the RAbD mover. Note that you can also set light_chain on the command line if you are only working on a single PDB through the rosetta run.

<AntibodyDesignMover name="RAbD" seq_design_cdrs="L1,L3" light_chain="kappa"/>

This will take a about a minute (50 seconds on my laptop). Output structures and scores are in outputs/rabd if you wish to copy them over - these include 4 more structures.

In [ ]:

rabd = XmlObjects.static_get_mover('<AntibodyDesignMover name="RAbD" seq_design_cdrs="L1,L3" light_chain="kappa"/>')
if not os.getenv("DEBUG"):
    rabd.apply(pose)

Now, for the sake of learning how to do this - how would we do this in code instead of the XML - we just need to use setters.

In [ ]:

pose = original_pose.clone()
rabd2 = AntibodyDesignMover()

cdrs = vector1_protocols_antibody_CDRNameEnum()
cdrs.append(l1)
cdrs.append(l3)

rabd2.set_seq_design_cdrs(cdrs)
rabd2.set_light_chain("kappa")
if not os.getenv("DEBUG"):
    rabd2.apply(pose)

Score the input pose using the InterfaceAnalayzerMover

In [ ]:

from rosetta.protocols.analysis import InterfaceAnalyzerMover

if not os.getenv("DEBUG"):
    iam = InterfaceAnalyzerMover("LH_ABCDEFGIJKZ")
    iam.set_pack_separated(True)
    iam.apply(pose)
    iam.apply(original_pose)

    dg_term = "dG_separated"
    print("dG Diff:", pose.scores[dg_term] - original_pose[dg_term])

Has the energy gone down after our sequence design? The dG_separated is calculated by scoring the complex, separating the antigen from the antibody, repacking side-chains at the interface, and then taking the difference in score - i.e. the dG.

Lets take a look at scores from a previous run of 5 antibodies. The scorefiles are in json format, so it will be easy to turn them into pandas Dataframes and do some cool stuff. We'll do this often as the runtimes increase for our protocol - but all the scores in them can be accessed using the pose.scores attribute (which is PyRosetta-specific functionality.)
Are any of these better than our input pose?

In [ ]:

df = load_json_scorefile("expected_outputs/rabd/tutA1_score.sc")
df = drop_cluster_columns(df, keep_cdrs=["L1", "L3"])
df

Tut A2. Graft Design¶

Now we will be enabling graft design AND sequence design on L1 and L3 loops. With an nstruct (n decoys) of 5, we are doing 25 design trials total - IE 25 actual grafts.

<AntibodyDesignMover name="RAbD" seq_design_cdrs="L1,L3" graft_design_cdrs="L1,L3">

This will take a about 2-3 times as long as sequence design, as grafting a non-breaking loop takes time. This was 738 seconds on my laptop to generate 5. Here, you will generate 1 at about 150 seconds Ouptut structures and scores are in ../expected_outputs/rabd.

Typically, we require a much higher -outer_cycle_rounds and number of decoys to see anything significant. Did this improve energies in your single antibody? How about our pre-generated ones? Load and take a look at the scorefile as a pandas DataFrame as we did above (expected_outputs/rabd/tutA2_score.sc).

In [ ]:

# YOUR CODE HERE
raise NotImplementedError()

Lets merge these dataframes, sort by dG_separated, and see if any of our graft-design models did better.

In [ ]:

df_tut_a12 = pandas.concat([df, df_a2], ignore_index=True).sort_values("dG_separated", ascending=True)
df_tut_a12

Take a look at the lowest (dG) scoring pose in pymol - do you see any difference in L1 and L3 loops there? Do they make better contact than what we had before?

Lets take a look in pymol.   

		pymol inputs/rabd/my_ab.pdb inputs/rabd/tutA2_* 
		@color_cdrs.pml
		center full_epitope

How different are the L1 and L3 loops? Have any changed length?

Lets take a look at the clusters in our dataframe. Have they changed from the native?

In [ ]:

if not os.getenv("DEBUG"):
    print("L1", original_pose.scores["cdr_cluster_ID_L1"])
    print("L3", original_pose.scores["cdr_cluster_ID_L3"])

Tut A3. Basic De-novo run¶

Here, we want to do a denovo-run (without docking), starting with random CDRs grafted in - instead of whatever we have in the antibody to start with (only for the CDRs that are actually undergoing graft-design). This is useful, as we start the design with very high energy and work our way down. Note that since this is an entirely new interface for our model protein, this interface is already at a very high energy - and so this is less needed, but it should be noted how to do this. (139 seconds on my laptop). Do this below as you have done in other tutorials - either through code or XML.

<AntibodyDesignMover name="RAbD" graft_design_cdrs="L1,L3" seq_design_cdrs="L1,L3" 
		                                                                  random_start="1"/>

In [ ]:

if not os.getenv("DEBUG"):
    # YOUR CODE HERE
    raise NotImplementedError()

Would starting from a random CDR help anywhere? Perhaps if you want an entirely new cluster or length to break a patent or remove some off target effects? We will use it below to start de novo design with docking.

Tut A4. RAbD Framework Components¶

This tutorial will give you some exprience with an antibody design protocol using the RosettaAntibdyDesign components. We will take the light chain CDRs from a malaria antibody and graft them into our antibody. In the tutorial we are interested in stabilizing the grafted CDRs in relation to the whole antibody, instead of interface design to an antigen.

We will graft the CDRs in, minimize the structure with CDR dihedral constraints (that use the CDR clusters) to not purturb the CDRs too much, and then design the framework around the CDRs while designing the CDRs and neighbors. The result should be mutations that better accomodate our new CDRs. This can be useful for humanizing potential antibodies or framework switching, where we want the binding properties of certain CDRs, but the stability or immunological profile of a different framework.

We are using an XML here for simplicity - all components are available in PyRosetta, but harder to setup.

1. Copy the Files¶

    cp ../inputs/rabd/ab_design_components.xml .
	cp ../inputs/rabd/malaria_cdrs.pdb .

Take a look at the xml.

We are using the AntibodyCDRGrafter to do the grafting of our CDRs.
We then add constraints using CDRDihderalConstraintMovers for each CDR with use the CDR clusters determinedy by RosettaAntibody to keep from perturbing the CDRs too much.
Finally, we do a round of pack/min/pack using the RestrictToCDRsAndNeighborsOperation and the CDRResidueSelector. This task operation controls what we pack and design. It first limits packing and design to only the CDRs and its neighbors. By specifying the design_framework=1 option we allow the neighbor framework residues to design, while the CDRs and antigen neighbors will only repack. If we wanted to disable antigen repacking, we would pass the DisableAntibodyRegionOperation task operation. Using this, we can specify any antibody region as antibody_region, cdr_region, or antigen_region and we can disable just design or both packing and design.

These task operations allow us to chisel exactly what we want to design in antibody, sans a residue-specific resfile (though we could combine these with one of them!). All of these tools are available in-code. If you've done the design workshop, you will know how to use them here. Checkout rosetta.protocols.antibody.task_operations for a list of them. Finally, we use the new SimpleMetric system to obtain our final sequence of the CDRs to compare to our native antibody as well as pymol selections of our CDRs - which you have been introduced to in the previous tutorial.

PyRosetta Locations¶

rosetta.protocols.antibody.task_operations

rosetta.protocols.antibody.constraints

rosetta.protocols.antibody.residue_selectors

Documentation¶

2. Run the protocol or copy the output (357 seconds).¶

3. Look at the score file as you have before. Are the sequences different between what we started with? How about the interaction energies?¶

In [ ]:

os.system('cp inputs/rabd/malaria_cdrs.pdb .')

In [ ]:

if not os.getenv("DEBUG"):
    pose = original_pose.clone()
    parser = RosettaScriptsParser()
    protocol = parser.generate_mover_and_apply_to_pose(pose, "inputs/rabd/ab_design_components.xml")
    protocol.apply(pose)

Challenge: Custom Design Protocol in code¶

If you want a challenge - try to set these up in-code without RosettaScripts. It can be tricky - which is why I made PyRosetta finally work optionally with RosettaScripts. Its good to know how to use both.

In [ ]:

# YOUR CODE HERE
raise NotImplementedError()

Tutorial B: Optimizing Interface Energy (opt-dG)¶

Tut B1. Optimizing dG¶

Here, we want to set the protocol to optimize the interface energy during Monte Carlo instead of total energy. The interface energy is calculated by the InterfaceAnalyzerMover through a specialized MonteCarlo object called MonteCarloInterface. This is useful to improve binding energy and generally results in better interface energies . Resulting models should still be pruned for high total energy. This was benchmarked in the paper, and has been used for real-life designs to the HIV epitope (165 seconds for 1 decoy).

Use the provided XML or set this up through code.

<AntibodyDesignMover name="RAbD" seq_design_cdrs="L1,L3" graft_design_cdrs="L1,L3" mc_optimize_dG="1" />

In [ ]:

if not os.getenv("DEBUG"):
    # YOUR CODE HERE
    raise NotImplementedError()

Load the scorefile with nstruct=5 from expected_outputs/rabd/tutB1_score.sc

Compare this data to tutorial A2. Are the interface energies better? Has the Shape Complementarity improved (sc score) improved?

Tut B2. Optimizing Interface Energy and Total Score (opt-dG and opt-E)¶

Here, we want to set the protocol to optimize the interface energy during Monte Carlo, but we want to add some total energy to the weight. Because the overall numbers of total energy will dominate the overall numbers, we only add a small weight for total energy. This has not been fully benchmarked, but if your models have very bad total energy when using opt-dG - consider using it. (178 sec for 1 nstruct)

<AntibodyDesignMover name="RAbD" seq_design_cdrs="L1,L3" graft_design_cdrs="L1,L3" mc_optimize_dG="1" mc_total_weight=".01 mc_interface_weight=".99 light_chain="kappa"/>

In [ ]:

if not os.getenv("DEBUG"):
    # YOUR CODE HERE
    raise NotImplementedError()

Use the scorefile from an nstruct=5 run to compare total energies (total_score) of this run vs the one right before it located at expected_outputs/rabd/tutB2_score.sc. Are the total scores better?

Tutorial C: Towards DeNovo Design: Integrated Dock/Design¶

This tutorial takes a long time to run as docking is fairly slow - even with the optimizations that are part of RAbD. PLEASE USE THE PREGENERATED OUTPUT. The top 10 designs from each tutorial and associated scorefiles of a 1000 nstruct cluster run are in the output directory. Note that we are starting these tutorials with a pre-relaxed structure in order to get more reliable rosetta energies. Since we are running a large nstruct, we will escape the local energy well that this leads us into.

Tut C1. RosettaDocking¶

In this example, we use integrated RosettaDock (with sequence design during the high-res step) to sample the antibody-antigen orientation, but we don't care where the antibody binds to the antigen. Just that it binds. IE - No Constraints. The RAbD protocol always has at least Paratope SiteConstraints enabled to make sure any docking is contained to the paratope (like most good docking programs).

This takes too long to run, so PLEASE USE THE OUTPUT GENERATED FOR YOU. We will use opt-dG here and for these tutorials, we will be sequence-designing all cdrs to begin to create a better interface. Note that sequence design happens whereever packing occurs - Including during high-resolution docking.

<AntibodyDesignMover name="RAbD" mc_optimize_dG="1" do_dock="1" seq_design_cdrs="L1,L2,L3,H1,H2,H3" graft_design_cdrs="L1,L2,L3,H1,H2" light_chain="kappa"/>

Use pymol to load the files, and load tutC1_score from the expected_outputs directory as a pandas dataframe.

	pymol my_ab.pdb expected_outputs/rabd/top10_C1/* 
	@color_cdrs.pml
	center full_epitope

Where is the antibody in the resulting designs? Are the interfaces restricted to the Paratope? Has the epitope moved relative to the starting interface?

Tut C2. Auto Epitope Constraints¶

Allow Dock-Design, incorporating auto-generated SiteConstraints to keep the antibody around the starting interface residues. These residues are determined by being within 6A to the CDR residues (This interface distance can be customized). Again, these results are provided for you.

<AntibodyDesignMover name="RAbD" mc_optimize_dG="1" do_dock="1" use_epitope_csts="1" 
   seq_design_cdrs="L1,L2,L3,H1,H2,H3" graft_design_cdrs="L1,L2,L3,H1,H2" light_chain="kappa"/>

Use pymol to load the files and checkout the scores in expected_outputs/rabd/tutC2_score.sc as before.

pymol my_ab.pdb expected_outputs/rabd/top10_C2/* 
@color_cdrs.pml
center full_epitope

How do these compare with with the previous tutorial? Are the antibodies closer to the starting interface? Are the scores better?

Tut C3. Specific Residue Epitope Constraints¶

Allow Dock-Design, as above, but specify the Epitope Residues and Paratope CDRs to guide the dock/design to have these interact.

For now, we are more focused on the light chain. We could do this as a two-stage process, where we first optimize positioning and CDRs of the light chain and then the heavy chain or simply add heavy chain CDRs to the paratope CDRs option.

<AntibodyDesignMover name="RAbD" mc_optimize_dG="1" do_dock="1" use_epitope_csts="1" 
	epitope_residues="38J,52J,34K,37K" paratope_cdrs="L1,L3" 
	seq_design_cdrs="L1,L2,L3,H1,H2,H3" graft_design_cdrs="L1,L2,L3,H1,H2" light_chain="kappa"/>

Again, load these into Pymol and take a look at the scorefile in a dataframe.

pymol my_ab.pdb expected_outputs/rabd/top10_C3/* 
@color_cdrs.pml
center full_epitope

Now that we have specified where we want the interface to be and are additionally designing more CDRS, how do the enegies compare? Are we starting to get a decent interface with the lowest energy structure?

How do these compare with the previous runs?

Tutorial D: Advanced Settings and CDR Instruction File Customization¶

Once again, all output files are in expected_outputs. Please use these if you want - as many of these take around 10 minutes to run.

Tut D1. CDR Instruction File¶

More complicated design runs can be created by using the Antibody Design Instruction file. This file allows complete customization of the design run. See below for a review of the syntax of the file and possible customization. An instruction file is provided where we use conservative design on L1 and graft in L1, H2, and H1 CDRs at a longer length to attempt to create a larger interface area. More info on instruction file syntax can be found at the end of this tutorial. (150 seconds on my laptop for nstruc 1)

cp ../inputs/my_instruction_file.txt .
cp ../inputs/default_cdr_instructions.txt .

Take a look at the default CDR instructions. These are loaded by default into Rosetta. There is syntax examples at the end of the file. Run the XML or attempt to use it in-code.

<AntibodyDesignMover name="RAbD" instruction_file="my_instruction_file.txt" 
   seq_design_cdrs="L1,L3,H1,H2,H3" graft_design_cdrs="L1,H2,H1" random_start="1" light_chain="kappa"/>

In [ ]:

if not os.getenv("DEBUG"):
    # YOUR CODE HERE
    raise NotImplementedError()

Tut D2. Dissallow AAs and the Resfile¶

Here, we will disallow ANY sequence design into Proline residues and Cysteine residues, while giving a resfile to further LIMIT design and packing as specific positions. These can be given as 3 or 1 letter codes and mixed codes such as PRO and C are accepted. Note that the resfile does NOT turn any residues ON, it is simply used to optionally LIMIT design residue types and design and packing positions.

Resfile syntax can be found here: [https://www.rosettacommons.org/docs/wiki/rosetta_basics/file_types/resfiles] Note that specifying a resfile and dissalowing aa are only currently available as cmd-line options that are read by RAbD.

Runtime is less than a minute for nstruct 1.

cp ../inputs/rabd/my_resfile.resfile .

Take a look at the resfile. Can you describe what it is we are doing with it? Unfortunately, at the moment, resfile setting is only available as a cmd-line option that needs to be set in the init() function as -resfile my_resfile.resfile

<AntibodyDesignMover name="RAbD" seq_design_cdrs="L1,L3,H1,H2,H3" light_chain="kappa"/>

Tut D3. Mintype¶

Here, we will change the mintype to relax. This mintype enables Flexible-Backbone design as we have seen in previous workshops. Our default is to use min/pack cycles, but relax typically works better. However, it also takes considerably more time! This tutorial takes about 339 seconds for one struct!

<AntibodyDesignMover name="RAbD" seq_design_cdrs="L1,L3,H1,H3" mintype="relax light_chain="kappa"/>

In [ ]:

if not os.getenv("DEBUG"):
    # YOUR CODE HERE
    raise NotImplementedError()

Tut D4. Framework Design¶

Finally, we want to allow the framework residues AROUND the CDRs we will be designing and any interacting antigen residues to design as well here. In addition, we will disable conservative framework design as we want something funky (this is not typically recommended and is used here to indicate what you CAN do. Note that we will also design the interface of the antigen using the -design_antigen option. This can be useful for vaccine design. Note that these design options are cmd-line ony options currently (but will be available in a later version of Rosetta). Approx 900 second runtime.

antibody_designer.linuxgccrelease -s my_ab.pdb -seq_design_cdrs L1 L3 H1 H3 \
		    -light_chain kappa -resfile my_resfile.resfile -disallow_aa PRO CYS \
		    -mintype relax -design_antigen -design_framework \
		    -conservative_framework_design false -nstruct 1 -out:prefix tutD4_

<AntibodyDesignMover name="RAbD" seq_design_cdrs="L1,L3,H1,H3" mintype="relax" light_chain="kappa"/>

Tut D5. H3 Stem, kT, and Sequence variablility.¶

Finally, we want increased variability for our sequence designs. So, we will increase number of sampling rounds for our lovely cluster profiles using the -seq_design_profile_samples option, increase kT, and allow H3 stem design.

We will enable H3 Stem design here, which can cause a flipping of the H3 stem type from bulged to non-bulged and vice-versa. Typically, if you do this, you may want to run loop modeling on the top designs to confirm the H3 structure remains in-tact. Note that once again, these sequence-design specific options must be set on the cmd-line.

Description of the seq_design_profile_samples option (default 1): "If designing using profiles, this is the number of times the profile is sampled each time packing done. Increase this number to increase variability of designs - especially if not using relax as the mintype."

This tutorial takes approx 450 seconds.

antibody_designer.linuxgccrelease -s my_ab.pdb -seq_design_cdrs L1 L2 H3 \
		    -graft_design_cdrs L1 L2 -light_chain kappa -design_H3_stem -inner_kt 2.0 \
		    -outer_kt 2.0 -seq_design_profile_samples 5 -nstruct 5 -out:prefix tutD5_

<AntibodyDesignMover name="RAbD" seq_design_cdrs="L1,L3,H1,H3" mintype="relax" 
		                                                        inner_kt="2.0" outer_kt="2.0"/>

How different is the sequence of L1,L2, and H3 from our starting antibody?

You should now be ready to explore and use RosettaAntibodyDesign on your own. Congrats! Thanks for going through this tutorial!

The full reference manual can be found here: https://www.rosettacommons.org/docs/latest/application_documentation/antibody/RosettaAntibodyDesign#antibody-design-cdr-instruction-file

< Side Chain Conformations and Dunbrack Energies | Contents | Index | Protein Design with a Resfile and FastRelax >

In [ ]:

< RosettaAntibody Framework | Contents | Index | RosettaCarbohydrates >