Keywords: CDRResidueSelector
In this workshop we will learn how to use the RosettaAntibody framework. The full RosettaAntibody (modeling) code is not available in PyRosetta, unfortunately - as it is based around an application. To use that, you will have to use either the ROSIE server, or the Rosetta application.
For a full overview of the RosettaAntibody modeling application, see this paper: https://www.ncbi.nlm.nih.gov/pubmed/28125104
Snugdock, and H3 modeling component of RosettaAntibody are available here as movers.
!pip install pyrosettacolabsetup
import pyrosettacolabsetup; pyrosettacolabsetup.install_pyrosetta()
import pyrosetta; pyrosetta.init()
Make sure you are in the directory with the pdb files:
cd google_drive/MyDrive/student-notebooks/
Lets import the antibody namespace so we can start using it. Take a look at the different modules that are a part of the antibody module.
Note that we can also do from rosetta.protocols.antibody import *
in order to make accessing the enums much easier. For the purpose of this workshop, we will use antibody
to traverse the contents. This makes it easier for you to use tab completion for exploration.
#Python
from pyrosetta import *
from pyrosetta.rosetta import *
from pyrosetta.teaching import *
#Core Includes
from rosetta.core.select import residue_selector as selections
from rosetta.protocols import antibody
Here, we will initialize a typical run of Rosetta. We could use the -input_ab_scheme
option with AHo_Scheme
, but we will learn to instead pass this to our main antibody framework code.
init('-use_input_sc -ignore_unrecognized_res \
-ignore_zero_occupancy false -load_PDB_components false -no_fconfig')
PyRosetta-4 2019 [Rosetta PyRosetta4.Release.python36.mac 2019.33+release.1e60c63beb532fd475f0f704d68d462b8af2a977 2019-08-09T15:19:57] retrieved from: http://www.pyrosetta.org (C) Copyright Rosetta Commons Member Institutions. Created in JHU by Sergey Lyskov and PyRosetta Team. core.init: Rosetta version: PyRosetta4.Release.python36.mac r230 2019.33+release.1e60c63beb5 1e60c63beb532fd475f0f704d68d462b8af2a977 http://www.pyrosetta.org 2019-08-09T15:19:57 core.init: command: PyRosetta -use_input_sc -ignore_unrecognized_res -ignore_zero_occupancy false -load_PDB_components false -no_fconfig -database /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyrosetta-2019.33+release.1e60c63beb5-py3.6-macosx-10.6-intel.egg/pyrosetta/database basic.random.init_random_generator: 'RNG device' seed mode, using '/dev/urandom', seed=967592561 seed_offset=0 real_seed=967592561 basic.random.init_random_generator: RandomGenerator:init: Normal mode, seed=967592561 RG_type=mt19937
Let's load an antibody - this this the same antibody we used to learn packing and design. :)
#Import a pose
pose = pose_from_pdb("inputs/2r0l_1_1.pdb")
original_pose = pose.clone()
core.import_pose.import_pose: File 'inputs/2r0l_1_1.pdb' automatically determined to be of type PDB core.conformation.Conformation: [ WARNING ] missing heavyatom: OXT on residue ARG:CtermProteinFull 108 core.conformation.Conformation: [ WARNING ] missing heavyatom: OXT on residue SER:CtermProteinFull 225 core.conformation.Conformation: [ WARNING ] missing heavyatom: OXT on residue ARG:CtermProteinFull 464 core.conformation.Conformation: Found disulfide between residues 23 88 core.conformation.Conformation: current variant for 23 CYS core.conformation.Conformation: current variant for 88 CYS core.conformation.Conformation: current variant for 23 CYD core.conformation.Conformation: current variant for 88 CYD core.conformation.Conformation: Found disulfide between residues 130 204 core.conformation.Conformation: current variant for 130 CYS core.conformation.Conformation: current variant for 204 CYS core.conformation.Conformation: current variant for 130 CYD core.conformation.Conformation: current variant for 204 CYD core.conformation.Conformation: Found disulfide between residues 250 266 core.conformation.Conformation: current variant for 250 CYS core.conformation.Conformation: current variant for 266 CYS core.conformation.Conformation: current variant for 250 CYD core.conformation.Conformation: current variant for 266 CYD core.conformation.Conformation: Found disulfide between residues 258 328 core.conformation.Conformation: current variant for 258 CYS core.conformation.Conformation: current variant for 328 CYS core.conformation.Conformation: current variant for 258 CYD core.conformation.Conformation: current variant for 328 CYD core.conformation.Conformation: Found disulfide between residues 353 422 core.conformation.Conformation: current variant for 353 CYS core.conformation.Conformation: current variant for 422 CYS core.conformation.Conformation: current variant for 353 CYD core.conformation.Conformation: current variant for 422 CYD core.conformation.Conformation: Found disulfide between residues 385 401 core.conformation.Conformation: current variant for 385 CYS core.conformation.Conformation: current variant for 401 CYS core.conformation.Conformation: current variant for 385 CYD core.conformation.Conformation: current variant for 401 CYD core.conformation.Conformation: Found disulfide between residues 412 440 core.conformation.Conformation: current variant for 412 CYS core.conformation.Conformation: current variant for 440 CYS core.conformation.Conformation: current variant for 412 CYD core.conformation.Conformation: current variant for 440 CYD
The main tool that we will use is the AntibodyInfo
object. This allows you to get a TON of information about the antibody to use in various custom protocols.
Note that this antibody has already been renumbered using the PyIgClassify server.
Since we are not defining the numbering scheme and cdr definition during init, we will need to pass an Enum to the AntibodyInfo object.
ab_info = antibody.AntibodyInfo(pose, antibody.AHO_Scheme, antibody.North)
basic.io.database: Database file opened: sampling/antibodies/cluster_center_dihedrals.txt protocols.antibody.AntibodyNumberingParser: Antibody numbering scheme definitions read successfully protocols.antibody.AntibodyNumberingParser: Antibody CDR definition read successfully antibody.AntibodyInfo: Successfully finished the CDR definition antibody.AntibodyInfo: AC Detecting Regular CDR H3 Stem Type antibody.AntibodyInfo: ARFWWRSFDYW antibody.AntibodyInfo: AC Finished Detecting Regular CDR H3 Stem Type: KINKED antibody.AntibodyInfo: AC Finished Detecting Regular CDR H3 Stem Type: Kink: 1 Extended: 0 antibody.AntibodyInfo: Setting up CDR Cluster for H1 protocols.antibody.cluster.CDRClusterMatcher: Length: 13 Omega: TTTTTTTTTTTTT antibody.AntibodyInfo: Setting up CDR Cluster for H2 protocols.antibody.cluster.CDRClusterMatcher: Length: 10 Omega: TTTTTTTTTT antibody.AntibodyInfo: Setting up CDR Cluster for H3 protocols.antibody.cluster.CDRClusterMatcher: Length: 10 Omega: TTTTTTTTTT antibody.AntibodyInfo: Setting up CDR Cluster for L1 protocols.antibody.cluster.CDRClusterMatcher: Length: 11 Omega: TTTTTTTTTTT antibody.AntibodyInfo: Setting up CDR Cluster for L2 protocols.antibody.cluster.CDRClusterMatcher: Length: 8 Omega: TTTTTTTT antibody.AntibodyInfo: Setting up CDR Cluster for L3 protocols.antibody.cluster.CDRClusterMatcher: Length: 9 Omega: TTTTTTCTT
Lets take a look at what AntibodyInfo prints
print(ab_info)
//////////////////////////////////////////////////////////////////////////////// /// Rosetta Antibody Info /// /// /// /// Antibody Type: Regular Antibody /// Light Chain Type: unknown /// Predict H3 Cterminus Base: KINKED /// /// H1 info: /// length: 13 /// sequence: AASGFTISNSGIH /// north_cluster: H1-13-1 /// loop_info: LOOP start: 131 stop: 143 cut: 137 size: 13 skip rate: 0 extended?: False /// H2 info: /// length: 10 /// sequence: WIYPTGGATD /// north_cluster: H2-10-1 /// loop_info: LOOP start: 158 stop: 167 cut: 163 size: 10 skip rate: 0 extended?: False /// H3 info: /// length: 10 /// sequence: ARFWWRSFDY /// north_cluster: H3-10-1 /// loop_info: LOOP start: 205 stop: 214 cut: 206 size: 10 skip rate: 0 extended?: False /// L1 info: /// length: 11 /// sequence: RASQDVSTAVA /// north_cluster: L1-11-1 /// loop_info: LOOP start: 24 stop: 34 cut: 29 size: 11 skip rate: 0 extended?: False /// L2 info: /// length: 8 /// sequence: YSASFLYS /// north_cluster: L2-8-1 /// loop_info: LOOP start: 49 stop: 56 cut: 53 size: 8 skip rate: 0 extended?: False /// L3 info: /// length: 9 /// sequence: QQSYTTPPT /// north_cluster: L3-9-cis7-1 /// loop_info: LOOP start: 89 stop: 97 cut: 93 size: 9 skip rate: 0 extended?: False ////////////////////////////////////////////////////////////////////////////////
Isn't that AWESOME!! I think so. But I wrote a lot of that code!
Anyway, as you can see you can get a pretty fair bit of information out of the AntibodyInfo object. In fact, most antibody-related code actually takes an AntibodyInfo object or constructs one from set numbering scheme, cdr definitions, and pose passed to it. You will see this as we go.
Note the north_cluster here. This is useful in some modeling tasks, but becomes much more relevant during antibody design. More information on what we mean by north_cluster can be found in this paper, if you want to read ahead a bit. https://www.ncbi.nlm.nih.gov/pubmed/21035459
Now, lets use the AntibodyInfo class to get a bit of useful information out of our antibody.
print("h1", ab_info.get_CDR_start(antibody.h1, pose))
print("h2", ab_info.get_CDR_end(antibody.h2, pose))
h1 131 h2 167
Now lets use these enums a bit more. They go in order from 1 to 8, with 7 and 8 being CDR4 loops - also known as H3 loops. We won't worry about them just yet.
for i in range(1, 7):
print(i, ab_info.get_CDR_name(antibody.CDRNameEnum(i)))
for cdr in ['L1', 'l1', 'L2', 'l2', 'L3', 'H1', 'H2', 'H3']:
print(cdr, str(ab_info.get_CDR_name_enum(cdr)))
print(str(antibody.h3))
print(int(antibody.h3))
1 H1 2 H2 3 H3 4 L1 5 L2 6 L3 L1 CDRNameEnum.l1 l1 CDRNameEnum.l1 L2 CDRNameEnum.l2 l2 CDRNameEnum.l2 L3 CDRNameEnum.l3 H1 CDRNameEnum.h1 H2 CDRNameEnum.h2 H3 CDRNameEnum.h3 CDRNameEnum.h3 3
Does this make enums a bit less confusing? These are named integers. The last function allows us to print either the actual cdr name enum or the integer from it. The cool thing here is that we can loop through all of the CDRs just by using a range 1-6 and rosetta will understand it.
Note that we convert the integer into a CDRNameEnum
in the function. If we are storing the cdr name enums as indexes to a dictionary or list, we don't need this. That is simply for the C++ code to work properly.
So we have seen that some of this code we can do directly within AntibodyInfo itself. Cool. But what if we need something more advanced? Lets use the class that actually does all this conversion.
enum_manager = antibody.AntibodyEnumManager()
print(enum_manager.numbering_scheme_enum_to_string(antibody.AHO_Scheme))
print(enum_manager.cdr_definition_enum_to_string(antibody.North))
print(enum_manager.cdr_name_string_to_enum("H1"))
print(enum_manager.antibody_region_enum_to_string(antibody.framework_region))
AHO_Scheme North CDRNameEnum.h1 framework_region
Use the function, get_region_or_residue
and get_CDRNameEnum_of_residue
and the manager to traverse the antibody and get relevant regions of all residues in the pose
### BEGIN SOLUTION
for i in range(1, pose.size()+1):
region = ab_info.get_region_of_residue(pose, i)
if (region == antibody.cdr_region):
print(i, enum_manager.cdr_name_enum_to_string(ab_info.get_CDRNameEnum_of_residue(pose, i)))
else:
print(i, enum_manager.antibody_region_enum_to_string(region))
### END SOLUTION
1 framework_region 2 framework_region 3 framework_region 4 framework_region 5 framework_region 6 framework_region 7 framework_region 8 framework_region 9 framework_region 10 framework_region 11 framework_region 12 framework_region 13 framework_region 14 framework_region 15 framework_region 16 framework_region 17 framework_region 18 framework_region 19 framework_region 20 framework_region 21 framework_region 22 framework_region 23 framework_region 24 L1 25 L1 26 L1 27 L1 28 L1 29 L1 30 L1 31 L1 32 L1 33 L1 34 L1 35 framework_region 36 framework_region 37 framework_region 38 framework_region 39 framework_region 40 framework_region 41 framework_region 42 framework_region 43 framework_region 44 framework_region 45 framework_region 46 framework_region 47 framework_region 48 framework_region 49 L2 50 L2 51 L2 52 L2 53 L2 54 L2 55 L2 56 L2 57 framework_region 58 framework_region 59 framework_region 60 framework_region 61 framework_region 62 framework_region 63 framework_region 64 framework_region 65 framework_region 66 framework_region 67 framework_region 68 framework_region 69 framework_region 70 framework_region 71 framework_region 72 framework_region 73 framework_region 74 framework_region 75 framework_region 76 framework_region 77 framework_region 78 framework_region 79 framework_region 80 framework_region 81 framework_region 82 framework_region 83 framework_region 84 framework_region 85 framework_region 86 framework_region 87 framework_region 88 framework_region 89 L3 90 L3 91 L3 92 L3 93 L3 94 L3 95 L3 96 L3 97 L3 98 framework_region 99 framework_region 100 framework_region 101 framework_region 102 framework_region 103 framework_region 104 framework_region 105 framework_region 106 framework_region 107 framework_region 108 framework_region 109 framework_region 110 framework_region 111 framework_region 112 framework_region 113 framework_region 114 framework_region 115 framework_region 116 framework_region 117 framework_region 118 framework_region 119 framework_region 120 framework_region 121 framework_region 122 framework_region 123 framework_region 124 framework_region 125 framework_region 126 framework_region 127 framework_region 128 framework_region 129 framework_region 130 framework_region 131 H1 132 H1 133 H1 134 H1 135 H1 136 H1 137 H1 138 H1 139 H1 140 H1 141 H1 142 H1 143 H1 144 framework_region 145 framework_region 146 framework_region 147 framework_region 148 framework_region 149 framework_region 150 framework_region 151 framework_region 152 framework_region 153 framework_region 154 framework_region 155 framework_region 156 framework_region 157 framework_region 158 H2 159 H2 160 H2 161 H2 162 H2 163 H2 164 H2 165 H2 166 H2 167 H2 168 framework_region 169 framework_region 170 framework_region 171 framework_region 172 framework_region 173 framework_region 174 framework_region 175 framework_region 176 framework_region 177 framework_region 178 framework_region 179 framework_region 180 framework_region 181 framework_region 182 framework_region 183 framework_region 184 framework_region 185 framework_region 186 framework_region 187 framework_region 188 framework_region 189 framework_region 190 framework_region 191 framework_region 192 framework_region 193 framework_region 194 framework_region 195 framework_region 196 framework_region 197 framework_region 198 framework_region 199 framework_region 200 framework_region 201 framework_region 202 framework_region 203 framework_region 204 framework_region 205 H3 206 H3 207 H3 208 H3 209 H3 210 H3 211 H3 212 H3 213 H3 214 H3 215 framework_region 216 framework_region 217 framework_region 218 framework_region 219 framework_region 220 framework_region 221 framework_region 222 framework_region 223 framework_region 224 framework_region 225 framework_region 226 antigen_region 227 antigen_region 228 antigen_region 229 antigen_region 230 antigen_region 231 antigen_region 232 antigen_region 233 antigen_region 234 antigen_region 235 antigen_region 236 antigen_region 237 antigen_region 238 antigen_region 239 antigen_region 240 antigen_region 241 antigen_region 242 antigen_region 243 antigen_region 244 antigen_region 245 antigen_region 246 antigen_region 247 antigen_region 248 antigen_region 249 antigen_region 250 antigen_region 251 antigen_region 252 antigen_region 253 antigen_region 254 antigen_region 255 antigen_region 256 antigen_region 257 antigen_region 258 antigen_region 259 antigen_region 260 antigen_region 261 antigen_region 262 antigen_region 263 antigen_region 264 antigen_region 265 antigen_region 266 antigen_region 267 antigen_region 268 antigen_region 269 antigen_region 270 antigen_region 271 antigen_region 272 antigen_region 273 antigen_region 274 antigen_region 275 antigen_region 276 antigen_region 277 antigen_region 278 antigen_region 279 antigen_region 280 antigen_region 281 antigen_region 282 antigen_region 283 antigen_region 284 antigen_region 285 antigen_region 286 antigen_region 287 antigen_region 288 antigen_region 289 antigen_region 290 antigen_region 291 antigen_region 292 antigen_region 293 antigen_region 294 antigen_region 295 antigen_region 296 antigen_region 297 antigen_region 298 antigen_region 299 antigen_region 300 antigen_region 301 antigen_region 302 antigen_region 303 antigen_region 304 antigen_region 305 antigen_region 306 antigen_region 307 antigen_region 308 antigen_region 309 antigen_region 310 antigen_region 311 antigen_region 312 antigen_region 313 antigen_region 314 antigen_region 315 antigen_region 316 antigen_region 317 antigen_region 318 antigen_region 319 antigen_region 320 antigen_region 321 antigen_region 322 antigen_region 323 antigen_region 324 antigen_region 325 antigen_region 326 antigen_region 327 antigen_region 328 antigen_region 329 antigen_region 330 antigen_region 331 antigen_region 332 antigen_region 333 antigen_region 334 antigen_region 335 antigen_region 336 antigen_region 337 antigen_region 338 antigen_region 339 antigen_region 340 antigen_region 341 antigen_region 342 antigen_region 343 antigen_region 344 antigen_region 345 antigen_region 346 antigen_region 347 antigen_region 348 antigen_region 349 antigen_region 350 antigen_region 351 antigen_region 352 antigen_region 353 antigen_region 354 antigen_region 355 antigen_region 356 antigen_region 357 antigen_region 358 antigen_region 359 antigen_region 360 antigen_region 361 antigen_region 362 antigen_region 363 antigen_region 364 antigen_region 365 antigen_region 366 antigen_region 367 antigen_region 368 antigen_region 369 antigen_region 370 antigen_region 371 antigen_region 372 antigen_region 373 antigen_region 374 antigen_region 375 antigen_region 376 antigen_region 377 antigen_region 378 antigen_region 379 antigen_region 380 antigen_region 381 antigen_region 382 antigen_region 383 antigen_region 384 antigen_region 385 antigen_region 386 antigen_region 387 antigen_region 388 antigen_region 389 antigen_region 390 antigen_region 391 antigen_region 392 antigen_region 393 antigen_region 394 antigen_region 395 antigen_region 396 antigen_region 397 antigen_region 398 antigen_region 399 antigen_region 400 antigen_region 401 antigen_region 402 antigen_region 403 antigen_region 404 antigen_region 405 antigen_region 406 antigen_region 407 antigen_region 408 antigen_region 409 antigen_region 410 antigen_region 411 antigen_region 412 antigen_region 413 antigen_region 414 antigen_region 415 antigen_region 416 antigen_region 417 antigen_region 418 antigen_region 419 antigen_region 420 antigen_region 421 antigen_region 422 antigen_region 423 antigen_region 424 antigen_region 425 antigen_region 426 antigen_region 427 antigen_region 428 antigen_region 429 antigen_region 430 antigen_region 431 antigen_region 432 antigen_region 433 antigen_region 434 antigen_region 435 antigen_region 436 antigen_region 437 antigen_region 438 antigen_region 439 antigen_region 440 antigen_region 441 antigen_region 442 antigen_region 443 antigen_region 444 antigen_region 445 antigen_region 446 antigen_region 447 antigen_region 448 antigen_region 449 antigen_region 450 antigen_region 451 antigen_region 452 antigen_region 453 antigen_region 454 antigen_region 455 antigen_region 456 antigen_region 457 antigen_region 458 antigen_region 459 antigen_region 460 antigen_region 461 antigen_region 462 antigen_region 463 antigen_region 464 antigen_region
Use either the PyRosetta docs on AntibodyInfo, or the interactive notebook to use AntibodyInfo to get the length and cluster of L1.
### BEGIN SOLUTION
print(ab_info.get_CDR_length(antibody.l1))
print(ab_info.get_CDR_cluster(antibody.l1).cluster())
### END SOLUTION
11 CDRClusterEnum.L1_11_1
The CDRCluster object has a lot of information about a particular cluster. Lets use it to get the normalized distance in degrees of the L1 cluster.
L1_cluster = ab_info.get_CDR_cluster(antibody.l1)
print(L1_cluster.normalized_distance_in_degrees())
7.137242784087944
Anything below 35 or 40 degrees is very close to the cluster center. This is a structure with a very well-defined L1-11-1 loop - one of the most common L1 lengths and clusters.
It may not seem like much, but numbering scheme translation is a very difficult thing to do without mistakes. Rosetta now has this ability to make it much easier to understand antibody structural or sequence papers in a highly tested and fairly easy-to-use implementation. Lets take a look. We'll use AntibodyInfo
and the get_landmark_resnum()
function for this, but you could also use function get_antibody_numbering_info()
that will give you all the conversions - though it is certainly a bit more tricky to use.
The conserved cysteine residue forming the intradomain disulfide bridge always carries the label "23" as in the IMGT numbering scheme, while according to Kabat, it was labeled L23 in Vk and Vl, H22 in VH. Let's find this residue in our antibody. https://www.bioc.uzh.ch/plueckthun/antibody/Numbering/FR1/index.html
rosetta_num = ab_info.get_landmark_resnum(pose, antibody.Kabat_Scheme, 'H', 22)
What is the chain and resnum in OUR Aho numbering scheme? Is this a cysteine? How about a disulfide?
### BEGIN SOLUTION
print(pose.pdb_info().pose2pdb(rosetta_num))
print(pose.residue(rosetta_num))
### END SOLUTION
23 H Residue 130: CYS:disulfide (CYS, C): Base: CYS Properties: POLYMER PROTEIN CANONICAL_AA SC_ORBITALS METALBINDING DISULFIDE_BONDED ALPHA_AA L_AA Variant types: DISULFIDE Main-chain atoms: N CA C Backbone atoms: N CA C O H HA Side-chain atoms: CB SG 1HB 2HB Atom Coordinates: N : -13.918, -0.011, 40.022 CA : -15.022, -0.943, 39.837 C : -16.073, -0.624, 40.895 O : -15.877, -0.945, 42.066 CB : -14.515, -2.379, 40.021 SG : -15.8, -3.608, 40.319 H : -13.6187, 0.217354, 40.9592 HA : -15.4065, -0.826975, 38.8236 1HB : -13.9648, -2.68746, 39.1317 2HB : -13.8236, -2.41565, 40.8626 Mirrored relative to coordinates in ResidueType: FALSE
Ok. Cool. Lets do the same thing for the Cysteine that is connected to this residue.
In IMGT this is residue 104 on the heavy chain. Lets do the same thing here. Use tab completion for antibody.IMGT_Scheme
for the enum. https://www.bioc.uzh.ch/plueckthun/antibody/Numbering/FR3a/index.html
### BEGIN SOLUTION
pre_cdr3_c = ab_info.get_landmark_resnum(pose, antibody.IMGT_Scheme, 'H', 104)
### END SOLUTION
Once again, what is the residue in our AHO-numbered antibody? Is it a Cysteine? Is it disulfide bonded?
### BEGIN SOLUTION
print(pose.pdb_info().pose2pdb(pre_cdr3_c))
print(pose.residue(pre_cdr3_c))
### END SOLUTION
106 H Residue 204: CYS:disulfide (CYS, C): Base: CYS Properties: POLYMER PROTEIN CANONICAL_AA SC_ORBITALS METALBINDING DISULFIDE_BONDED ALPHA_AA L_AA Variant types: DISULFIDE Main-chain atoms: N CA C Backbone atoms: N CA C O H HA Side-chain atoms: CB SG 1HB 2HB Atom Coordinates: N : -14.312, -6.402, 36.316 CA : -14.452, -6.929, 37.646 C : -15.678, -7.837, 37.662 O : -16.599, -7.672, 36.856 CB : -14.501, -5.824, 38.705 SG : -15.935, -4.767, 38.638 H : -14.9281, -5.66132, 36.0129 HA : -13.5885, -7.5585, 37.8613 1HB : -14.4721, -6.27099, 39.699 2HB : -13.6222, -5.18697, 38.607 Mirrored relative to coordinates in ResidueType: FALSE
Lets expore the sequence of this antibody
ab_seq = ab_info.get_antibody_sequence()
print(ab_seq)
L1_seq = ab_info.get_CDR_sequence_with_stem(antibody.l1, pose)
print("L1", L1_seq)
for i in range(1, 7):
cdr = antibody.CDRNameEnum(i)
print(cdr, ab_info.get_CDR_sequence_with_stem(cdr, pose))
DIQMTQSPSSLSASVGDRVTITCRASQDVSTAVAWYQQKPGKAPKLLIYSASFLYSGVPSRFSGSGSGTDFTLTISSLQPEDFATYYCQQSYTTPPTFGQGTKVEIKREVQLVESGGGLVQPGGSLRLSCAASGFTISNSGIHWVRQAPGKGLEWVGWIYPTGGATDYADSVKGRFTISADTSKNTAYLQMNSLRAEDTAVYYCARFWWRSFDYWGQGTLVTVSS L1 RASQDVSTAVA CDRNameEnum.h1 AASGFTISNSGIH CDRNameEnum.h2 WIYPTGGATD CDRNameEnum.h3 ARFWWRSFDY CDRNameEnum.l1 RASQDVSTAVA CDRNameEnum.l2 YSASFLYS CDRNameEnum.l3 QQSYTTPPT
Use tab completion to find other useful functions. This includes movemap, loops, and fold-tree creation for specific tasks. With ResidueSelectors, this functionality is not quite as useful, but you should know that it is here.
All functions are fair-game, except these: get_TaskFactory_AllCDRs
and get_TaskFactory_OneCDR
- This will be removed from AntibodyInfo as it is extremely specific to a particular antibody modeling task.
Util functions in Rosetta are stored in the util.hh
file in each directory that has one. Within PyRosetta, when you import the namespace, these come with. There are many that you should be aware of to make modeling and design tasks easier for custom protocols.
We will go through some examples here.
The get_cdr_loops function takes a vector1 bool of CDRs. Use the Enums to set H3 and L3 to true. Vector1 bool starts as all negative.
h3_l3 = rosetta.utility.vector1_bool(6)
print(h3_l3)
h3_l3[antibody.h3] = True
h3_l3[antibody.l3] = True
#Here, we get cdr loops, and set the stem size to 2,
# so we include 2 residues on either side of the CDR loop (called the stem), to help us in modeling.
h3_l3_loops = antibody.get_cdr_loops(ab_info, pose, h3_l3, 2)
print(h3_l3_loops)
vector1_bool[0, 0, 0, 0, 0, 0] LOOP begin end cut skip_rate extended LOOP start: 203 stop: 216 cut: 210 size: 14 skip rate: 0 extended?: False LOOP start: 87 stop: 99 cut: 93 size: 13 skip rate: 0 extended?: False
We could use the NeighborhoodResidueSelector as you have used in the passed to get neighbors. Instead, lets use a general function to get all the epitope residues within an 8 Angstrum distance of the paratope.
epi_residues = antibody.select_epitope_residues(ab_info, pose, 8)
total=0
for i in range(1, len(epi_residues)+1):
if epi_residues[i]:
print(i)
total+=1
print("Total Epitope Residues:", total)
267 270 271 272 273 299 300 301 302 303 304 305 307 308 309 310 313 396 397 398 454 458 Total Epitope Residues: 22
So that was cool. But lets the wonderful ReturnResidueSubsetSelector
to take this ResidueSubset
of the epitope residues and store the data as a ResidueSelector
!
epi_res_selector = selections.ReturnResidueSubsetSelector(epi_residues)
Now what? Lets use some SimpleMetrics using the selector to calculate something about these epitope residues.
import rosetta.core.simple_metrics.metrics as sm
sasa_metric = sm.SasaMetric(epi_res_selector)
print("\nSASA", sasa_metric.calculate(pose))
total_metric = sm.TotalEnergyMetric(epi_res_selector)
print("\nTOTAL RESIDUE ENERGY", total_metric.calculate(pose))
#Lets use a useful metric to select these residues in pymol
pymol_metric = sm.SelectedResiduesPyMOLMetric(epi_res_selector)
print("\nSELECTION", pymol_metric.calculate(pose))
SASA 531.9639835627297
core.scoring.ScoreFunctionFactory: SCOREFUNCTION: ref2015
TOTAL RESIDUE ENERGY -2.6964334237038683
SELECTION select rosetta_sele, (chain A and resid 42,45,46,47,48,74,75,76,77,78,79,80,82,83,84,85,88,171,172,173,229,233)
Now lets see which of these residues are most buried in the interface and the residues which have the lowest energy. Note that this is not ddG - we would need to separate the chains for this. We can use the protocols.toolbox.rigid_body.translate
function to do that.
Use the pymol selection (copy from select...) and lets take a look at them in PyMol. Then run the code below.
import rosetta.core.simple_metrics.per_residue_metrics as residue_sm
import operator
res_sasa_metric = residue_sm.PerResidueSasaMetric()
res_sasa_metric.set_residue_selector(epi_res_selector)
per_res_sasa = res_sasa_metric.calculate(pose)
#print(per_res_sasa)
#Convert the Map to a Dictionary, which are essentially the same thing.
for ele in sorted(per_res_sasa.items(), key=operator.itemgetter(1), reverse=False):
print(ele)
(300, 0.0) (303, 0.0) (305, 0.0) (304, 0.4468042885105504) (267, 1.024686682355324) (398, 1.8244508447514138) (302, 4.098746729421277) (396, 4.098746729421277) (271, 5.380780270728442) (307, 5.504698647620041) (301, 7.689850871116937) (313, 8.068377879289471) (270, 8.322490061360945) (458, 14.530261630816586) (310, 17.873691916125896) (308, 39.92505047193878) (454, 46.22431607929343) (397, 54.69632267990853) (299, 60.31999848338232) (309, 75.13475653929288) (272, 76.34620896564937) (273, 100.45374379174626)
Cool. So the most buried residues at the interface are 300, 303, 305. Convert those to the PDB chain/num using PDBInfo and take a look at them in PyMOL.
res_energy_metric = residue_sm.PerResidueEnergyMetric()
res_energy_metric.set_residue_selector(epi_res_selector)
per_res_energy = res_sasa_metric.calculate(pose)
#print(per_res_sasa)
#Convert the Map to a Dictionary, which are essentially the same thing.
for ele in sorted(per_res_energy.items(), key=operator.itemgetter(1), reverse=False):
print(ele[0], pose.pdb_info().pose2pdb(ele[0]), ele[1])
300 75 A 0.0 303 78 A 0.0 305 80 A 0.0 304 79 A 0.4468042885105504 267 42 A 1.024686682355324 398 173 A 1.8244508447514138 302 77 A 4.098746729421277 396 171 A 4.098746729421277 271 46 A 5.380780270728442 307 82 A 5.504698647620041 301 76 A 7.689850871116937 313 88 A 8.068377879289471 270 45 A 8.322490061360945 458 233 A 14.530261630816586 310 85 A 17.873691916125896 308 83 A 39.92505047193878 454 229 A 46.22431607929343 397 172 A 54.69632267990853 299 74 A 60.31999848338232 309 84 A 75.13475653929288 272 47 A 76.34620896564937 273 48 A 100.45374379174626
Wow! Why is 48A so high in energy!? This may be due to the fact that we are working with a crystal structure that has not been pre-relaxed using the pareto-optimal protocol. Be sure when using PDBs from the data bank for production runs to do this, outputting about 10 models and selecting the lowest energy residue. Or, you could use density to relax within the crystal denstiy. Either works well.
https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0059004
from rosetta.protocols.antibody.residue_selector import *
cdr_selector = CDRResidueSelector(ab_info)
cdr_selector.set_cdrs(h3_l3)
sele = cdr_selector.apply(pose)
for i in range(1, len(sele)):
if sele[i]:
print(i, pose.pdb_info().pose2pdb(i))
89 107 L 90 108 L 91 109 L 92 110 L 93 111 L 94 135 L 95 136 L 96 137 L 97 138 L 205 107 H 206 108 H 207 109 H 208 110 H 209 111 H 210 134 H 211 135 H 212 136 H 213 137 H 214 138 H
We can use the AntibodyRegionSelector to select a specific region:
antigen_region
, framework_region
, and cdr_region
region_selector = AntibodyRegionSelector(ab_info)
region_selector.set_region(antibody.antigen_region)
sele = region_selector.apply(pose)
for i in range(1, len(sele)):
if sele[i]:
print(i, pose.pdb_info().pose2pdb(i))
89 107 L 90 108 L 91 109 L 92 110 L 93 111 L 94 135 L 95 136 L 96 137 L 97 138 L 205 107 H 206 108 H 207 109 H 208 110 H 209 111 H 210 134 H 211 135 H 212 136 H 213 137 H 214 138 H
rosetta.protocols.antibody.snugdock
namespace. Both the full protocol, SnugDockProtocol
and the mover, Snugdock
are available and easy to setup through code - but their run time is extremely long.Antibody_H3
app. Personally, I would use the Rosetta C++ application for this with specific options specified in the docs, however you can call this in PyRosetta.protocols.antibody
namespace. Documentation on this mover can be found here (XML or code-level interface is available): https://www.rosettacommons.org/docs/latest/scripting_documentation/RosettaScripts/Movers/movers_pages/antibodies/AntibodyCDRGrafterPlease site these papers when using any of RosettaAntibody.
J. Adolf-Bryfogle, O Kalyuzhniy, M Kubitz, B. D. Weitzner, X Hu, Y Adachi, W R. Schief, R L. Dunbrack Jr.,
B. D. Weitzner, J. R. Jeliazkov, S. Lyskov*, N. M. Marze, D. Kuroda, R. Frick, J. Adolf-Bryfogle, N. Biswas, R. L. Dunbrack Jr., and J. J. Gray,
B. D. Weitzner, D. Kuroda, N. M. Marze, J. Xu & J. J. Gray,
A. Sivasubramanian,* A. Sircar,* S. Chaudhury & J. J. Gray,