Keywords: carbohydrate, glycan, sugar, glucose, mannose, sugar, GlycanTreeSet, saccharide, furanose, pyranose, aldose, ketose
In this chapter, we will focus on a special subset of non-peptide oligo- and polymers — carbohydrates.
Modeling carbohydrates — also known as saccharides, glycans, or simply sugars — comes with some special challenges. For one, most saccharide residues contain a ring as part of their backbone. This ring provides potentially new degrees of freedom when sampling. Additionally, carbohydrate structures are often branched, leading in Rosetta to more complicated FoldTrees
.
This chapter includes a quick overview of carbohydrate nomenclature, structure, and basic interactions within Rosetta.
Sugars (saccharides) are defined as hyroxylated aldehydes and ketones. A typical monosaccharide has an equal number of carbon and oxygen atoms. For example, glucose has the molecular formula C6H12O6.
Sugars containing more than three carbons will spontaneously cyclize in aqueous environments to form five- or six-membered hemiacetals and hemiketals. Sugars with five-membered rings are called furanoses; those with six-membered rings are called pyranoses (Fig. 1).
A sugar is classified as an aldose or ketose, depending on whether it has an aldehyde or ketone in its linear form (Fig. 2).
The different sugars have different names, depending on the stereochemistry at each of the carbon atoms in the molecule. For example, glucose has one set of stereochemistries, while mannose has another.
In addition to their full names, many individual saccharide residues have three-letter codes, just like amino acid residues do. Glucose is "Glc" and mannose is "Man".
A glycan tree is made up of many sugar residues, each residue a ring. The 'backbone' of a glycan is the connection between one residue and another. The chemical makeup of each sugar residue in this 'linkage' effects the propensity/energy of each bacbone dihedral angle. In addition, sugars can be attached via different carbons of the parent glycan. In this way, the chemical makeup and the attachment position effects the dihedral propensities. Typically, there are two backbone dihedral angles, but this could be up to 4+ angles depending on the connection.
In IUPAC, the dihedrals of N are defined as the dihedrals between N and N-1 (IE - the parent linkage). The ASN (or other glycosylated protein residue's) dihedrals become part of the first glycan residue that is connected. For this first first glycan residue that is connected to an ASN, it has 4 torsions, while the ASN now has none!
If you are creating a movemap for dihedral residues, please use the MoveMapFactory
as this has the IUPAC nomenclature of glycan residues built in in order to allow proper DOF sampling of the backbone residues, especially for branching glycan trees. In general, all of our samplers should use residue selectors and use the MoveMapFactory to build movemaps internally.
A sugar's side-chains are the constitutents of the glycan ring, which are typically an OH group or an acetyl group. These are sampled together at 60 degree angles by default during packing. A higher granularity of rotamers cannot currently be handled in Rosetta, but 60 degrees seems adequete for our purposes.
Within Rosetta, glycan connectivity information is stored in the GlycanTreeSet
, which is continually updated to reflect any residue changes or additions to the pose.
This info is always available through the function
pose.glycan_tree_set()
Chemical information of each glycan residue can be accessed through the CarbohydrateInfo object, which is stored in each ResidueType object:
pose.residue_type(i).carbohydrate_info()
We will cover both of these classes in the next tutorial.
Residue centric modeling and design of saccharide and glycoconjugate structures Jason W. Labonte Jared Adolf-Bryfogle William R. Schief Jeffrey J. Gray Journal of Computational Chemistry, 11/30/2016 - https://doi.org/10.1002/jcc.24679
Automatically Fixing Errors in Glycoprotein Structures with Rosetta Brandon Frenz, Sebastian Rämisch, Andrew J. Borst, Alexandra C. Walls Jared Adolf-Bryfogle, William R. Schief, David Veesler, Frank DiMaio Structure, 1/2/2019
Let's use Pyrosetta to compare some common monosaccharide residues and see how they differ. As usual, we start by importing the `pyrosetta` and `rosetta` namespaces.
!pip install pyrosettacolabsetup
import pyrosettacolabsetup; pyrosettacolabsetup.install_pyrosetta()
import pyrosetta; pyrosetta.init()
from pyrosetta import *
from pyrosetta.teaching import *
from pyrosetta.rosetta import *
First, one needs the -include_sugars
option, which will tell Rosetta to load sugars and add the sugar_bb energy term to a default scorefunction. This scoreterm is like rama for the sugar dihedrals which connect each sugar residue.
init('-include_sugars')
PyRosetta-4 2019 [Rosetta PyRosetta4.Release.python36.mac 2019.39+release.93456a567a8125cafdf7f8cb44400bc20b570d81 2019-09-26T14:24:44] retrieved from: http://www.pyrosetta.org (C) Copyright Rosetta Commons Member Institutions. Created in JHU by Sergey Lyskov and PyRosetta Team. core.init: Checking for fconfig files in pwd and ./rosetta/flags core.init: Reading fconfig.../Users/jadolfbr/.rosetta/flags/common core.init: core.init: core.init: Rosetta version: PyRosetta4.Release.python36.mac r233 2019.39+release.93456a567a8 93456a567a8125cafdf7f8cb44400bc20b570d81 http://www.pyrosetta.org 2019-09-26T14:24:44 core.init: command: PyRosetta -include_sugars -database /Users/jadolfbr/Library/Python/3.6/lib/python/site-packages/pyrosetta-2019.39+release.93456a567a8-py3.6-macosx-10.6-intel.egg/pyrosetta/database basic.random.init_random_generator: 'RNG device' seed mode, using '/dev/urandom', seed=1177525307 seed_offset=0 real_seed=1177525307 basic.random.init_random_generator: RandomGenerator:init: Normal mode, seed=1177525307 RG_type=mt19937
When loading structures from the PDB that include glycans, we use these options. This includes an option to write out the structures in pdb format instead of the (better) Rosetta format. We will be using these options in the next tutorial.
-maintain_links
-auto_detect_glycan_connections
-alternate_3_letter_codes pdb_sugar
-write_glycan_pdb_codes
-load_PDB_components false
pm = PyMOLMover()
We will use the function, pose_from_saccharide_sequence()
, which must be imported from the core.pose
namespace. Unlike with peptide chains, one-letter-codes will not suffice when specifying saccharide chains, because there is too much information to convey; we must use at least four letters. The first three letters are the sugar's three-letter code; the fourth letter designates whether the residue is a furanose (f
) or pyranose (p
).
from pyrosetta.rosetta.core.pose import pose_from_saccharide_sequence
glucose = pose_from_saccharide_sequence('Glcp')
galactose = pose_from_saccharide_sequence('Galp')
mannose = pose_from_saccharide_sequence('Manp')
core.chemical.GlobalResidueTypeSet: Finished initializing fa_standard residue type set. Created 1251 residue types core.chemical.GlobalResidueTypeSet: Total time to initialize 1.25647 seconds. core.pose: by appending by jump... core.conformation.carbohydrates.GlycanTreeSet: Setting up Glycan Trees core.conformation.carbohydrates.GlycanTreeSet: Found 1 glycan trees. core.pose: by appending by jump... core.conformation.carbohydrates.GlycanTreeSet: Setting up Glycan Trees core.conformation.carbohydrates.GlycanTreeSet: Found 1 glycan trees. core.pose: by appending by jump... core.conformation.carbohydrates.GlycanTreeSet: Setting up Glycan Trees core.conformation.carbohydrates.GlycanTreeSet: Found 1 glycan trees.
Just like with peptides, saccharides come in two enantiomeric forms, labelled l and d. (Note the small-caps, used in print.) These can be loaded into PyRosetta using the prefixes `L-` and `D-`.
L_glucose = pose_from_saccharide_sequence('L-Glcp')
D_glucose = pose_from_saccharide_sequence('D-Glcp')
core.pose: by appending by jump... core.conformation.carbohydrates.GlycanTreeSet: Setting up Glycan Trees core.conformation.carbohydrates.GlycanTreeSet: Found 1 glycan trees.
The carbon that is at a higher oxidation state — that is, the carbon of the hemiacetal/-ketal in the cyclic form or the carbon that is the carbonyl carbon of the aldehyde or ketone in the linear form — is called the anomeric carbon. Because the carbonyl of an aldehyde or ketone is planar, a sugar molecule can cyclize into one of two forms, one in which the resulting hydroxyl group is pointing "up" and another in which the same hydroxyl group is pointing "down". These two anomers are labelled α and β.
alpha_D_glucose = pose_from_saccharide_sequence('a-D-Glcp')
core.pose: by appending by jump... core.conformation.carbohydrates.GlycanTreeSet: Setting up Glycan Trees core.conformation.carbohydrates.GlycanTreeSet: Found 1 glycan trees.
Oligo- and polysaccharides are composed of simple monosaccharide residues connected by acetal and ketal linkages called glycosidic bonds. Any of the monosaccharide's hydroxyl groups can be used to form a linkage to the anomeric carbon of another monosaccharide, leading to both linear and branched molecules.
Rosetta can create both linear and branched oligosaccharides from an IUPAC sequence. (IUPAC is the international organization dedicated to chemical nomenclature.)
To properly build a linear oligosaccharide, Rosetta must know the following details about each sugar residue being created in the following order:
->2)
), →4) (->4)
), →6) (->6)
), etc.; default value is ->4)-
a
or alpha
) or β (b
or beta
); default value is alpha
L
) or d (D
); default value is D
Residues must be separated by hyphens. Glycosidic linkages can be specified with full IUPAC notation, e.g., -(1->4)-
for “-(1→4)-”. (This means that the residue on the left connects from its C1 (anomeric) position to the hydoxyl oxygen at C4 of the residue on the right.) Rosetta will assume -(1->
for aldoses and -(2->
for ketoses.
Note that the standard is to write the IUPAC sequence of a saccharide chain in reverse order from how they are numbered. Lets create three new oligosacharides from sequence.
maltotriose = pose_from_saccharide_sequence('a-D-Glcp-' * 3)
lactose = pose_from_saccharide_sequence('b-D-Galp-(1->4)-a-D-Glcp')
isomaltose = pose_from_saccharide_sequence('->6)-Glcp-' * 2)
core.pose: by appending by jump... core.conformation.carbohydrates.GlycanTreeSet: Setting up Glycan Trees core.conformation.carbohydrates.GlycanTreeSet: Found 1 glycan trees. core.pose: by appending by jump... core.conformation.carbohydrates.GlycanTreeSet: Setting up Glycan Trees core.conformation.carbohydrates.GlycanTreeSet: Found 1 glycan trees. core.pose: by appending by jump... core.conformation.carbohydrates.GlycanTreeSet: Setting up Glycan Trees core.conformation.carbohydrates.GlycanTreeSet: Found 1 glycan trees.
When you print a Pose
containing carbohydrate residues, the sugar residues will be listed as Z
in the sequence.
print("maltotriose\n", maltotriose)
print("\nisomaltose\n", isomaltose)
print("\nlactose\n", lactose)
maltotriose PDB file name: alpha-D-Glcp-(1->4)-alpha-D-Glcp-(1->4)-alpha-D-Glcp Total residues: 3 Sequence: ZZZ Fold tree: FOLD_TREE EDGE 1 3 -1 isomaltose PDB file name: alpha-D-Glcp-(1->6)-alpha-D-Glcp Total residues: 2 Sequence: ZZ Fold tree: FOLD_TREE EDGE 1 2 -1 lactose PDB file name: beta-D-Galp-(1->4)-alpha-D-Glcp Total residues: 2 Sequence: ZZ Fold tree: FOLD_TREE EDGE 1 2 -1
However, you can have Rosetta print out the sequences for individual chains, using the chain_sequence()
method. If you do this, Rosetta is smart enough to give you a distinct sequence format for saccharide chains. (You may have noticed that the default file name for a .pdb
file created from this Pose
will be the same sequence.)
print(maltotriose.chain_sequence(1))
alpha-D-Glcp-(1->4)-alpha-D-Glcp-(1->4)-alpha-D-Glcp
print(isomaltose.chain_sequence(1))
alpha-D-Glcp-(1->6)-alpha-D-Glcp
print(lactose.chain_sequence(1))
beta-D-Galp-(1->4)-alpha-D-Glcp
Again, the standard is to show the sequence of a saccharide chain in reverse order from how they are numbered.
This is also how phi, psi, and omega are defined. From i+1 to i.for res in lactose.residues: print(res.seqpos(), res.name())
1 ->4)-alpha-D-Glcp:reducing_end 2 ->4)-beta-D-Galp:non-reducing_end
Notice that for polysaccharides, the upstream residue is called the reducing end, while the downstream residue is called the non-reducing end.
You will also see the terms parent and child being used across Rosetta. Here, for Residue 2, residue 1 is the parent. For Residue 1, Residue 2 is the child. Due to branching, residues can have more than one child/non-reducing-end, but only a single parent residue.
Rosetta stores carbohydrate-specific information within `ResidueType`. If you print a residue, this additional information will be displayed.
print(glucose.residue(1))
Residue 1: ->4)-alpha-D-Glcp:reducing_end:non-reducing_end (Glc, Z): Base: ->4)-alpha-D-Glcp Properties: POLYMER CARBOHYDRATE LOWER_TERMINUS UPPER_TERMINUS POLAR CYCLIC HEXOSE ALDOSE D_SUGAR PYRANOSE ALPHA_SUGAR Variant types: UPPER_TERMINUS_VARIANT LOWER_TERMINUS_VARIANT Main-chain atoms: C1 C2 C3 C4 O4 Backbone atoms: C1 C2 C3 C4 O4 C5 O5 VO5 VC1 H1 H2 H3 H4 HO4 H5 Ring atoms: C1 C2 C3 C4 C5 O5 Side-chain atoms: O1 O2 O3 C6 O6 HO1 HO2 HO3 1H6 2H6 HO6 Carbohydrate Properties for this Residue: Basic Name: glucose IUPAC Name: alpha-D-glucopyranose Abbreviation: alpha-D-Glcp Classification: aldohexose Stereochemistry: D Ring Form: pyranose Anomeric Form: alpha Modifications: none Polymeric Information: Main chain connection: N/A Branch connections: none Ring Conformer: 4C1 (chair): C-P parameters (q, phi, theta): 0.55, 180, 0; nu angles (degrees): 60, -60, 60, -60, 60, -60 O1 : axial O2 : equatorial O3 : equatorial O4 : equatorial C6 : equatorial Atom Coordinates: C1 : 0, 0, 0 C2 : 1.55, 0, 0 C3 : 2.04812, 1.44664, 0 C4 : 1.50806, 2.11919, -1.26369 O4 : 1.94666, 3.46908, -1.30661 C5 : -0.0200415, 2.06186, -1.21358 O5 : -0.475077, 0.686176, -1.1593 VO5: -0.492509, 0.676579, -1.17187 (virtual) VC1: 0.031762, 0.00822503, 0.00564973 (virtual) O1 : -0.494034, 0.697555, 1.2082 O2 : 2.02401, -0.669275, 1.15922 O3 : 3.4779, 1.4716, 1.64563e-16 C6 : -0.614146, 2.71298, -2.43962 O6 : -0.225074, 4.07556, -2.53127 H1 : -0.370662, -1.03564, 0.00767336 H2 : 1.90812, -0.520035, -0.900727 H3 : 1.67301, 1.95456, 0.900727 H4 : 1.88381, 1.57916, -2.14527 HO4: 1.61609, 3.94572, -0.516717 H5 : -0.369153, 2.59396, -0.316372 HO1: -0.167832, 1.62167, 1.20877 HO2: 3.00401, -0.669275, 1.15922 HO3: 3.78886, 2.40096, 5.03844e-17 1H6 : -1.71106, 2.65811, -2.3783 2H6 : -0.261365, 2.17983, -3.33478 HO6: -0.621924, 4.47587, -3.33293 Mirrored relative to coordinates in ResidueType: FALSE
Most bioolymers have predefined, named torsion angles for their main-chain and side-chain bonds, such as φ, ψ, and ω and the various χs for amino acid residues. The same is true for saccharide residues. The torsion angles of sugars are as follows:
Take special note of how φ, ψ, and ω are defined in the reverse order as the angles of the same names for amino acid residues!
The chi()
method of Pose
works with sugar residues in the same way that it works with amino acid residues, where the first argument is the χ subscript and the second is the residue number of the Pose
.
galactose.chi(1, 1)
-60.49356158672178
galactose.chi(2, 1)
-180.0
galactose.chi(3, 1)
180.0
galactose.chi(4, 1)
-59.999999999999986
galactose.chi(5, 1)
-59.999999999999986
galactose.chi(6, 1)
180.0
Likewise, we can use set_chi()
to change these torsion angles and observe the changes in
PyMOL, setting the option to keep history to true.
from pyrosetta.rosetta.protocols.moves import AddPyMOLObserver
observer = AddPyMOLObserver(galactose, True)
pm.apply(galactose)
galactose.set_chi(1, 1, 180)
## BEGIN SOLUTION
for chi_angle in zip([x for x in range(1, 6)], [120, 60, 60, 0, 60, -60]):
print(chi_angle)
galactose.set_chi(chi_angle[0] , 1, chi_angle[1])
## END SOLUTION
(1, 120) (2, 60) (3, 60) (4, 0) (5, 60)
The phi()
, set_phi()
, psi()
, set_psi()
, omega()
, and set_omega()
methods of Pose
also work with sugars. However, since pose_from_saccharide_sequence()
may create a Pose
with angles that cause the residues to wrap around onto each other, instead, let's reload some Pose's from .pdb
files.
maltotriose = pose_from_file('inputs/glycans/maltotriose.pdb')
isomaltose = pose_from_file('inputs/glycans/isomaltose.pdb')
core.import_pose.import_pose: File 'inputs/maltotriose.pdb' automatically determined to be of type PDB core.io.pdb.pdb_reader: Parsing 0 .pdb records with unknown format to search for Rosetta-specific comments. core.io.pose_from_sfr.PoseFromSFRBuilder: [ WARNING ] Glc1 has an unfavorable ring conformation; the coordinates for this input structure may have been poorly assigned. core.io.pose_from_sfr.PoseFromSFRBuilder: [ WARNING ] Glc2 has an unfavorable ring conformation; the coordinates for this input structure may have been poorly assigned. core.io.pose_from_sfr.PoseFromSFRBuilder: [ WARNING ] Glc3 has an unfavorable ring conformation; the coordinates for this input structure may have been poorly assigned. core.conformation.carbohydrates.GlycanTreeSet: Setting up Glycan Trees core.conformation.carbohydrates.GlycanTreeSet: Found 1 glycan trees. core.import_pose.import_pose: File 'inputs/isomaltose.pdb' automatically determined to be of type PDB core.io.pdb.pdb_reader: Parsing 0 .pdb records with unknown format to search for Rosetta-specific comments. core.io.pose_from_sfr.PoseFromSFRBuilder: [ WARNING ] Glc1 has an unfavorable ring conformation; the coordinates for this input structure may have been poorly assigned. core.io.pose_from_sfr.PoseFromSFRBuilder: [ WARNING ] Glc2 has an unfavorable ring conformation; the coordinates for this input structure may have been poorly assigned. core.conformation.carbohydrates.GlycanTreeSet: Setting up Glycan Trees core.conformation.carbohydrates.GlycanTreeSet: Found 1 glycan trees.
pm.apply(maltotriose)
maltotriose.phi(1)
0.0
maltotriose.psi(1)
0.0
maltotriose.phi(2)
96.93460655617179
maltotriose.psi(2)
109.94421849476633
maltotriose.omega(2)
0.0
maltotriose.phi(3)
103.21420435050914
maltotriose.psi(3)
118.64096726060517
Notice how φ1 and ψ1 are undefined—the first residue is not connected to anything
observer = AddPyMOLObserver(maltotriose, True)
for i in (2, 3):
maltotriose.set_phi(i, 180)
maltotriose.set_psi(i, 180)
Isomaltose is composed of (1→6) linkages, so in this case omega torsions are defined. Get and set φ2, ψ2, ω2
for isomaltoseobserver = AddPyMOLObserver(isomaltose, True)
## BEGIN SOLUTION
print(isomaltose.phi(2))
print(isomaltose.psi(2))
print(isomaltose.omega(2))
## END SOLUTION
44.32677030464958 -170.8693381707546 49.383018645410004
Any cyclic residue also stores its ν angles.
pm.apply(glucose)
Glc1 = glucose.residue(1)
for i in range(1, 6): print(Glc1.nu(i))
59.99999999999999 -59.99999999999999 60.00000000000001 -59.999999999999986 59.99999999999999
However, we generally care more about the ring conformation of a cyclic residue’s rings, in this case, its only ring with index of 1. (The output values here are the ideal angles, not the actual angles, which we viewed above.)
print(Glc1.ring_conformer(1))
4C1 (chair): C-P parameters (q, phi, theta): 0.55, 180, 0; nu angles (degrees): 60, -60, 60, -60, 60, -60
The output above warrants a brief explanation. First, what does `4C1` mean? Most of us likely remember learning about chair and boat conformations in Organic Chemistry. Do you recall how there are two distinct chair conformations that can interconvert between each other? The names for these specific conformations are 4C1 and 1C4. The nomenclature is as follows: Superscripts to the left of the capital letter are above the plane of the ring if it is oriented such that its carbon atoms proceed in a clockwise direction when viewed from above. Subscripts to the right of the letter are below the plane of the ring. The letter itself is an abbreviation, where, for example, C indicates a chair conformation and B a boat conformation. In all, there are 38 different ideal ring conformations that any six-membered cycle can take.
`C-P parameters` refers to the Cremer–Pople parameters for this conformation (Cremer D, Pople JA. J Am Chem Soc. 1975;97:1354–1358.). C–P parameters are an alternative coordinate system used to refer to a ring conformation.
Finally, a RingConformer
in Rosetta includes the values of the ν angles. Each conformer has a unique set of angles.
Pose::set_nu()
does not exist, because it would rip a ring apart. Instead, to change a ring conformation, we need to use the set_ring_conformer()
method, which takes a RingConformer
object. Most of the time, you will not need to adjust the ring conformers, but you should be aware of it.
We can ask a cyclic ResidueType
for one of its RingConformerSet
s to give us the RingConformer
we want. (Each RingConformerSet
includes the list of possible idealized ring conformers that such a ring can attain as well as information about the most energetically favorable one.) Then, we can et the conformation for our residue through Pose
. (The arguments for set_ring_conformer()
are the Pose
’s sequence position, ring number, and the new conformer, respectively.)
ring_set = Glc1.type().ring_conformer_set(1)
conformer = ring_set.get_ideal_conformer_by_name('1C4')
glucose.set_ring_conformation(1, 1, conformer)
pm.apply(glucose)
.pdb
File LINK
Records¶Modified sugars can also be created in Rosetta, either from sequence or from file. In the former case, simply use the proper abbreviation for the modification after the “ring form code”. For example, the abbreviation for an N-acetyl group is “NAc”. Note the N-acetyl group in the PyMOL window.
LacNAc = pose_from_saccharide_sequence('b-D-Galp-(1->4)-a-D-GlcpNAc')
pm.apply(LacNAc)
core.pose: by appending by jump... core.conformation.carbohydrates.GlycanTreeSet: Setting up Glycan Trees core.conformation.carbohydrates.GlycanTreeSet: Found 1 glycan trees.
Rosetta can handle branched oligosaccharides as well, but when loading from a sequence, this requires the use of brackets, which is the standard IUPAC notation. For example, here is how one would load Lewisx (Lex), a common branched glyco-epitope, into Rosetta by sequence.
Lex = pose_from_saccharide_sequence('b-D-Galp-(1->4)-[a-L-Fucp-(1->3)]-D-GlcpNAc')
pm.apply(Lex)
core.pose: by appending by jump... core.conformation.Conformation: appending residue by a chemical bond in the foldtree: 3 ->4)-alpha-L-Fucp:non-reducing_end anchor: O3 1 root: C1 core.conformation.carbohydrates.GlycanTreeSet: Setting up Glycan Trees core.conformation.carbohydrates.GlycanTreeSet: Found 1 glycan trees.
One can also load branched carbohydrates from a .pdb
file. These .pdb
files must include LINK
records, which are a standard part of the PDB format. Open the test/data/carbohydrates/Lex.pdb
file and look bear the top to see an example LINK
record, which looks like this:
LINK O3 Glc A 1 C1 Fuc B 1 1555 1555 1.5
It tells us that there is a covalent linkage between O3 of glucose A1 and C1 of fucose B1 with a bond length of 1.5 Å. (The 1555
s indicate symmetry and are ignored by Rosetta.)
Note that if the LINK records are not in order, or HETNAM records are not in a Rosetta format, we will fail to load. In the next tutorial we will use auto-detection to do this. For now, we know Lex.pdb will load OK.
Lex = pose_from_file('inputs/glycans/Lex.pdb')
pm.apply(Lex)
core.import_pose.import_pose: File 'inputs/Lex.pdb' automatically determined to be of type PDB core.io.pdb.pdb_reader: Parsing 0 .pdb records with unknown format to search for Rosetta-specific comments. core.io.pose_from_sfr.PoseFromSFRBuilder: [ WARNING ] Glc1 has an unfavorable ring conformation; the coordinates for this input structure may have been poorly assigned. core.io.pose_from_sfr.PoseFromSFRBuilder: [ WARNING ] Gal2 has an unfavorable ring conformation; the coordinates for this input structure may have been poorly assigned. core.io.pose_from_sfr.PoseFromSFRBuilder: [ WARNING ] Fuc3 has an unfavorable ring conformation; the coordinates for this input structure may have been poorly assigned. core.chemical.AtomICoor: [ WARNING ] IcoorAtomID::atom_id(): Cannot get atom_id for POLYMER_LOWER of residue ->4)-alpha-L-Fucp:non-reducing_end 3. Returning BOGUS ID instead. core.conformation.Residue: [ WARNING ] missing an atom: 3 H1 that depends on a nonexistent polymer connection! core.conformation.Residue: [ WARNING ] --> generating it using idealized coordinates. core.chemical.AtomICoor: [ WARNING ] IcoorAtomID::atom_id(): Cannot get atom_id for POLYMER_LOWER of residue ->4)-alpha-L-Fucp:non-reducing_end 3. Returning BOGUS ID instead. core.conformation.carbohydrates.GlycanTreeSet: Setting up Glycan Trees core.conformation.carbohydrates.GlycanTreeSet: Found 1 glycan trees.
You may notice when viewing the structure in PyMOL that the hybridization of the carbonyl of the amido functionality of the N-acetyl group is wrong. This is because of an error in the model deposited in the PDB from which this file was generated. This is, unfortunately, a very common problem with sugar structures found in the PDB. It is always useful to use http://www.glycosciences.de to identify any errors in the solution PDB structure before working with them in Rosetta. The referenced paper, Automatically Fixing Errors in Glycoprotein Structures with Rosetta can be used as a guide to fixing these.
You may
also have noticed that the inputs/glycans/Lex.pdb
file indicated in its HETNAM
records that Glc1 was actually an N-acetylglycosamine (GlcNAc) with the indication 2-acetylamino-2-deoxy-
. This is optional and is helpful for human-readability, but Rosetta only needs to know the base ResidueType
of each sugar residue; specific
VariantType
s needed — and most sugar modifications are treated as VariantType
s — are determined automatically from
the atom names in the HETATM
records for the residue. Anything after the comma is ignored.
Pose
to see how the FoldTree
is defined.
## BEGIN SOLUTION
print(Lex)
## END SOLUTION
PDB file name: inputs/Lex.pdb Total residues: 3 Sequence: ZZZ Fold tree: FOLD_TREE EDGE 1 2 -1 EDGE 1 3 -2 O3 C1
Note the CHEMICAL
Edge
(-2
). This is Rosetta’s way of indicating a branch backbone connection. Unlike a standard
POLYMER
Edge
(-1
), this one tells you which atoms are involved.
Can you see now why φ and ψ are defined the way they are? If they were defined as in AA residues, they would not have unique definitions, since GlcNAc is a branch point. A monosaccharide can have multiple children, but it can never have more than a single parent.
Note that for this oligosaccharide χ3(1) is equivalent to ψ(3) and χ4(1) is equivalent to ψ(2). Make sure that you understand why!
Lex.chi(3, 1), Lex.psi(3)
(-97.03727535363538, -97.03727535363538)
Lex.chi(4, 1), Lex.psi(2)
(135.6468768989725, 135.6468768989725)
For chemically modified sugars, χ angles are redefined at the positions where substitution has occurred. For new χs that have come into existence from the addition of new atoms and bonds, new definitions are added to new indices. For example, for GlcN2Ac residue 1, χC2–N2–C′–Cα′ is accessed through `chi(7, 1)`.
Lex.chi(2, 1)
-230.8915297047683
Lex.set_chi(2, 1, 180)
pm.apply(Lex)
Lex.chi(7, 1)
179.81012671885887
Lex.set_chi(7, 1, 0)
pm.apply(Lex)
Branching does not have to occur at sugars; a glycan can be attached to the nitrogen of an ASN or the oxygen of a SER or THR. N-linked glycans themselves tend to be branched structures.
We will cover more on linked glycan trees in the next tutorial through the GlycanTreeSet
object - which is always present in a pose that has carbohydrates.
N_linked = pose_from_file('inputs/glycans/N-linked_14-mer_glycan.pdb')
pm.apply(N_linked)
print(N_linked)
core.import_pose.import_pose: File 'inputs/N-linked_14-mer_glycan.pdb' automatically determined to be of type PDB core.io.pdb.pdb_reader: Parsing 0 .pdb records with unknown format to search for Rosetta-specific comments. core.io.pose_from_sfr.PoseFromSFRBuilder: [ WARNING ] Glc6 has an unfavorable ring conformation; the coordinates for this input structure may have been poorly assigned. core.io.pose_from_sfr.PoseFromSFRBuilder: [ WARNING ] Glc7 has an unfavorable ring conformation; the coordinates for this input structure may have been poorly assigned. core.io.pose_from_sfr.PoseFromSFRBuilder: [ WARNING ] Man8 has an unfavorable ring conformation; the coordinates for this input structure may have been poorly assigned. core.io.pose_from_sfr.PoseFromSFRBuilder: [ WARNING ] Man9 has an unfavorable ring conformation; the coordinates for this input structure may have been poorly assigned. core.io.pose_from_sfr.PoseFromSFRBuilder: [ WARNING ] Man10 has an unfavorable ring conformation; the coordinates for this input structure may have been poorly assigned. core.io.pose_from_sfr.PoseFromSFRBuilder: [ WARNING ] Man11 has an unfavorable ring conformation; the coordinates for this input structure may have been poorly assigned. core.io.pose_from_sfr.PoseFromSFRBuilder: [ WARNING ] Glc12 has an unfavorable ring conformation; the coordinates for this input structure may have been poorly assigned. core.io.pose_from_sfr.PoseFromSFRBuilder: [ WARNING ] Glc13 has an unfavorable ring conformation; the coordinates for this input structure may have been poorly assigned. core.io.pose_from_sfr.PoseFromSFRBuilder: [ WARNING ] Glc14 has an unfavorable ring conformation; the coordinates for this input structure may have been poorly assigned. core.io.pose_from_sfr.PoseFromSFRBuilder: [ WARNING ] Man15 has an unfavorable ring conformation; the coordinates for this input structure may have been poorly assigned. core.io.pose_from_sfr.PoseFromSFRBuilder: [ WARNING ] Man16 has an unfavorable ring conformation; the coordinates for this input structure may have been poorly assigned. core.io.pose_from_sfr.PoseFromSFRBuilder: [ WARNING ] Man17 has an unfavorable ring conformation; the coordinates for this input structure may have been poorly assigned. core.io.pose_from_sfr.PoseFromSFRBuilder: [ WARNING ] Man18 has an unfavorable ring conformation; the coordinates for this input structure may have been poorly assigned. core.io.pose_from_sfr.PoseFromSFRBuilder: [ WARNING ] Man19 has an unfavorable ring conformation; the coordinates for this input structure may have been poorly assigned. core.chemical.AtomICoor: [ WARNING ] IcoorAtomID::atom_id(): Cannot get atom_id for POLYMER_LOWER of residue ->4)-beta-D-Glcp:2-AcNH 6. Returning BOGUS ID instead. core.conformation.Residue: [ WARNING ] missing an atom: 6 H1 that depends on a nonexistent polymer connection! core.conformation.Residue: [ WARNING ] --> generating it using idealized coordinates. core.chemical.AtomICoor: [ WARNING ] IcoorAtomID::atom_id(): Cannot get atom_id for POLYMER_LOWER of residue ->3)-alpha-D-Manp:->6)-branch 15. Returning BOGUS ID instead. core.conformation.Residue: [ WARNING ] missing an atom: 15 H1 that depends on a nonexistent polymer connection! core.conformation.Residue: [ WARNING ] --> generating it using idealized coordinates. core.chemical.AtomICoor: [ WARNING ] IcoorAtomID::atom_id(): Cannot get atom_id for POLYMER_LOWER of residue ->2)-alpha-D-Manp 18. Returning BOGUS ID instead. core.conformation.Residue: [ WARNING ] missing an atom: 18 H1 that depends on a nonexistent polymer connection! core.conformation.Residue: [ WARNING ] --> generating it using idealized coordinates. core.chemical.AtomICoor: [ WARNING ] IcoorAtomID::atom_id(): Cannot get atom_id for POLYMER_LOWER of residue ->4)-beta-D-Glcp:2-AcNH 6. Returning BOGUS ID instead. core.chemical.AtomICoor: [ WARNING ] IcoorAtomID::atom_id(): Cannot get atom_id for POLYMER_LOWER of residue ->3)-alpha-D-Manp:->6)-branch 15. Returning BOGUS ID instead. core.chemical.AtomICoor: [ WARNING ] IcoorAtomID::atom_id(): Cannot get atom_id for POLYMER_LOWER of residue ->2)-alpha-D-Manp 18. Returning BOGUS ID instead. core.conformation.carbohydrates.GlycanTreeSet: Setting up Glycan Trees core.conformation.carbohydrates.GlycanTreeSet: Found 1 glycan trees. PDB file name: inputs/N-linked_14-mer_glycan.pdb Total residues: 19 Sequence: ANASAZZZZZZZZZZZZZZ Fold tree: FOLD_TREE EDGE 1 5 -1 EDGE 2 6 -2 ND2 C1 EDGE 6 14 -1 EDGE 8 15 -2 O6 C1 EDGE 15 17 -1 EDGE 15 18 -2 O6 C1 EDGE 18 19 -1
for i in range(4): print(N_linked.chain_sequence(i + 1))
ANASA alpha-D-Glcp-(1->3)-alpha-D-Glcp-(1->3)-alpha-D-Glcp-(1->3)-alpha-D-Manp-(1->2)-alpha-D-Manp-(1->2)-alpha-D-Manp-(1->3)-beta-D-Manp-(1->4)-beta-D-GlcpNAc-(1->4)-beta-D-GlcpNAc- alpha-D-Manp-(1->2)-alpha-D-Manp-(1->3)-alpha-D-Manp- alpha-D-Manp-(1->2)-alpha-D-Manp-
O_linked = pose_from_file('inputs/glycans/O_glycan.pdb')
pm.apply(O_linked)
core.import_pose.import_pose: File 'inputs/O_glycan.pdb' automatically determined to be of type PDB core.io.pdb.pdb_reader: Parsing 0 .pdb records with unknown format to search for Rosetta-specific comments. core.io.pose_from_sfr.PoseFromSFRBuilder: [ WARNING ] Glc4 has an unfavorable ring conformation; the coordinates for this input structure may have been poorly assigned. core.chemical.AtomICoor: [ WARNING ] IcoorAtomID::atom_id(): Cannot get atom_id for POLYMER_LOWER of residue ->4)-alpha-D-Glcp:non-reducing_end 4. Returning BOGUS ID instead. core.conformation.Residue: [ WARNING ] missing an atom: 4 H1 that depends on a nonexistent polymer connection! core.conformation.Residue: [ WARNING ] --> generating it using idealized coordinates. core.chemical.AtomICoor: [ WARNING ] IcoorAtomID::atom_id(): Cannot get atom_id for POLYMER_LOWER of residue ->4)-alpha-D-Glcp:non-reducing_end 4. Returning BOGUS ID instead. core.conformation.carbohydrates.GlycanTreeSet: Setting up Glycan Trees core.conformation.carbohydrates.GlycanTreeSet: Found 1 glycan trees.
set_phi()
and set_psi()
still work when a glycan is linked to a peptide. (Below, we use pdb_info()
to give help us select the residue that we want. In this case, in the .pdb
file, the glycan is chain B.)
N_linked.set_phi(N_linked.pdb_info().pdb2pose("B", 1), 180)
pm.apply(N_linked)
Notice that in this case ψ and ω affect the side-chain torsions (χs) of the asparagine residue. This is another case where there are multiple ways of both naming and accessing the same specific torsion angles.
One can also create conjugated glycans from sequences if performed in steps, first creating the peptide portion by loading from a `.pdb` file or from sequence and then using the `glycosylate_pose()` function, (which needs to be imported first.) For example, to glycosylate an ASA peptide with a single glucose at position 2 of the peptide, we perform the following:
Here, we will glycosylate a simple peptide using the function, glycosylate_pose
. In the next tutorial, we will use a Mover interface to this function.
peptide = pose_from_sequence('ASA')
pm.apply(peptide)
from pyrosetta.rosetta.core.pose.carbohydrates import glycosylate_pose, glycosylate_pose_by_file
glycosylate_pose(peptide, 2, 'Glcp')
pm.apply(peptide)
core.conformation.Conformation: appending residue by a chemical bond in the foldtree: 5 ->4)-alpha-D-Glcp:non-reducing_end anchor: OG 2 root: C1 core.pose.carbohydrates.util: Glycosylated pose with a(n) Glcp-OGSER2 bond. core.pose.carbohydrates.util: Idealizing glycosidic torsions.
Here, we uset the main function to glycosylate a pose. In the next tutorial, we will use a Mover interface to do so.
It is also possible to glycosylate a pose with common glycans found in the database. These files end in the `.iupac` extension and are simply IUPAC sequences just as we have been using throughout this chapter.
Here is a list of some common iupacs.
bisected_fucosylated_N-glycan_core.iupac
bisected_N-glycan_core.iupac
common_names.txt
core_1_O-glycan.iupac
core_2_O-glycan.iupac
core_3_O-glycan.iupac
core_4_O-glycan.iupac
core_5_O-glycan.iupac
core_6_O-glycan.iupac
core_7_O-glycan.iupac
core_8_O-glycan.iupac
fucosylated_N-glycan_core.iupac
high-mannose_N-glycan_core.iupac
hybrid_bisected_fucosylated_N-glycan_core.iupac
hybrid_bisected_N-glycan_core.iupac
hybrid_fucosylated_N-glycan_core.iupac
hybrid_N-glycan_core.iupac
man5.iupac
man9.iupac
N-glycan_core.iupac
peptide = pose_from_sequence('ASA'); pm.apply(peptide)
glycosylate_pose_by_file(peptide, 2, 'core_5_O-glycan')
pm.apply(peptide)
core.conformation.Conformation: appending residue by a chemical bond in the foldtree: 6 ->3)-alpha-D-Galp:2-AcNH anchor: OG 2 root: C1 core.pose.carbohydrates.util: Glycosylated pose with a(n) a-D-GalpNAc-(1->3)-a-D-GalpNAc--OGSER2 bond. core.pose.carbohydrates.util: Idealizing glycosidic torsions.
You now have a grasp on the basics of RosettaCarbohydrates. Please continue onto the next tutorial for more on glycan residue selection and various movers that can be of use when working with glycans.
Chapter contributors: