Before you turn this problem in, make sure everything runs as expected. First, restart the kernel (in the menubar, select Kernel$\rightarrow$Restart) and then run all cells (in the menubar, select Cell$\rightarrow$Run All).
Make sure you fill in any place that says YOUR CODE HERE
or "YOUR ANSWER HERE", as well as your name and collaborators below:
NAME = ""
COLLABORATORS = ""
Keywords: RosettaScripts, script, xml, XMLObjects
RosettaScripts in another way to script custom modules in PyRosetta. It is much simpler than PyRosetta, but can be extremely powerful, and with great documentation. There are also many publications that give RosettaScript
examples, or whole protocols as a RosettaScript
instead of a mover or application. In addition, some early Rosetta code was written with RosettaScripts
in mind, and still may only be fully accessible via RosettaScripts
in order to change important variables.
Recent versions of Rosetta have enabled full RosettaScript protocols to be run in PyRosetta. A new class called XMLObjects
, has also enabled the setup of specific rosetta class types in PyRosetta instead of constructing them from code. This tutorial will introduce how to use this integration to get the most out of Rosetta. Note that some tutorials use RosettaScripts almost exclusively, such as the parametric protein design notebook, as it is simpler to use RS than setting up everything manually in code.
A RosettaScript is made up of different sections where different types of Rosetta classes are constructed. You will see many of these types throughout the notebooks to come. Briefly:
ScoreFunctions
: A scorefunction evaluates the energy of a pose through physical and statistal energy terms
ResidueSelectors
: These select a list of residues in a pose according to some criteria
Movers
: These do things to a pose. They all have an apply()
method that you will see shortly.
TaskOperations
: These control side-chain packing and design
SimpleMetrics
: The return some metric value of a pose. This value can be a real number, string, or a composite of values.
<ROSETTASCRIPTS>
<SCOREFXNS>
</SCOREFXNS>
<RESIDUE_SELECTORS>
</RESIDUE_SELECTORS>
<TASKOPERATIONS>
</TASKOPERATIONS>
<SIMPLE_METRICS>
</SIMPLE_METRICS>
<FILTERS>
</FILTERS>
<MOVERS>
</MOVERS>
<PROTOCOLS>
</PROTOCOLS>
<OUTPUT />
</ROSETTASCRIPTS>
Anything outside of the < > notation is ignored and can be used to comment the xml file
<ROSETTASCRIPTS>
<SCOREFXNS>
</SCOREFXNS>
<RESIDUE_SELECTORS>
<CDR name="L1" cdrs="L1"/>
</RESIDUE_SELECTORS>
<MOVE_MAP_FACTORIES>
<MoveMapFactory name="movemap_L1" bb="0" chi="0">
<Backbone residue_selector="L1" />
<Chi residue_selector="L1" />
</MoveMapFactory>
</MOVE_MAP_FACTORIES>
<SIMPLE_METRICS>
<TimingProfileMetric name="timing" />
<SelectedResiduesMetric name="rosetta_sele" residue_selector="L1" rosetta_numbering="1"/>
<SelectedResiduesPyMOLMetric name="pymol_selection" residue_selector="L1" />
<SequenceMetric name="sequence" residue_selector="L1" />
<SecondaryStructureMetric name="ss" residue_selector="L1" />
</SIMPLE_METRICS>
<MOVERS>
<MinMover name="min_mover" movemap_factory="movemap_L1" tolerance=".1" />
<RunSimpleMetrics name="run_metrics1" metrics="pymol_selection,sequence,ss,rosetta_sele" prefix="m1_" />
<RunSimpleMetrics name="run_metrics2" metrics="timing,ss" prefix="m2_" />
</MOVERS>
<PROTOCOLS>
<Add mover_name="run_metrics1"/>
<Add mover_name="min_mover" />
<Add mover_name="run_metrics2" />
</PROTOCOLS>
</ROSETTASCRIPTS>
Rosetta will carry out the order of operations specified in PROTOCOLS. An important point is that SimpleMetrics and Filters never change the sequence or conformation of the structure.
The movers do change the pose, and the output file will be the result of sequentially applying the movers in the protocols section. The standard scores of the output will be carried over from any protocol doing scoring, unless the OUTPUT tag is specified, in which case the corresponding score function from the SCOREFXNS block will be used.
It is recommended to read up on RosettaScripts here. Note that each type of Rosetta class has a list and documentation of ALL accessible components. This is extremely useful to get an idea of what Rosetta can do and how to use it in PyRosetta.
https://www.rosettacommons.org/docs/latest/scripting_documentation/RosettaScripts/RosettaScripts
# Notebook setup
import sys
if 'google.colab' in sys.modules:
!pip install pyrosettacolabsetup
import pyrosettacolabsetup
pyrosettacolabsetup.setup()
print ("Notebook is set for PyRosetta use in Colab. Have fun!")
Here we will use a whole the parser to generate a ParsedProtocol (mover). This mover can then be run with the apply method on a pose of interest.
Lets run the protocol above. We will be running this on the file itself.
from pyrosetta import *
from rosetta.protocols.rosetta_scripts import *
init('-no_fconfig @inputs/rabd/common')
pose = pose_from_pdb("inputs/rabd/my_ab.pdb")
original_pose = pose.clone()
parser = RosettaScriptsParser()
protocol = parser.generate_mover_and_apply_to_pose(pose, "inputs/min_L1.xml")
protocol.apply(pose)
Next, we will use XMLObjects to create a protocol from a string. Note that in-code, XMLOjbects uses special functionality of the RosettaScriptsParser
. Also note that the XMLObjects
also has a create_from_file
method that will take a path to an XML file.
pose = original_pose.clone()
min_L1 = """
<ROSETTASCRIPTS>
<SCOREFXNS>
</SCOREFXNS>
<RESIDUE_SELECTORS>
<CDR name="L1" cdrs="L1"/>
</RESIDUE_SELECTORS>
<MOVE_MAP_FACTORIES>
<MoveMapFactory name="movemap_L1" bb="0" chi="0">
<Backbone residue_selector="L1" />
<Chi residue_selector="L1" />
</MoveMapFactory>
</MOVE_MAP_FACTORIES>
<SIMPLE_METRICS>
<TimingProfileMetric name="timing" />
<SelectedResiduesMetric name="rosetta_sele" residue_selector="L1" rosetta_numbering="1"/>
<SelectedResiduesPyMOLMetric name="pymol_selection" residue_selector="L1" />
<SequenceMetric name="sequence" residue_selector="L1" />
<SecondaryStructureMetric name="ss" residue_selector="L1" />
</SIMPLE_METRICS>
<MOVERS>
<MinMover name="min_mover" movemap_factory="movemap_L1" tolerance=".1" />
<RunSimpleMetrics name="run_metrics1" metrics="pymol_selection,sequence,ss,rosetta_sele" prefix="m1_" />
<RunSimpleMetrics name="run_metrics2" metrics="timing,ss" prefix="m2_" />
</MOVERS>
<PROTOCOLS>
<Add mover_name="run_metrics1"/>
<Add mover_name="min_mover" />
<Add mover_name="run_metrics2" />
</PROTOCOLS>
</ROSETTASCRIPTS>
"""
xml = XmlObjects.create_from_string(min_L1)
protocol = xml.get_mover("ParsedProtocol")
protocol.apply(pose)
Here we will use our previous XMLObject that we setup using our script to pull a specific component from it. Note that while this is very useful for running pre-defined Rosetta objects, we will not have any tab completion for it as it will be a generic type - which means we will be unable to further modify it.
Lets grab the residue selector and then see which residues are L1.
L1_sele = xml.get_residue_selector("L1")
L1_res = L1_sele.apply(pose)
for i in range(1, len(L1_res)+1):
if L1_res[i]:
print("L1 Residue: ", pose.pdb_info().pose2pdb(i), ":", i )
Here, we instead of parsing a whole script, we'll simply create the same L1 selector from the string itself. This can be used for nearly every Rosetta class type in the script. The 'static' part in the name means that we do not have to construct the XMLObject first, we can simply call its function.
L1_sele = XmlObjects.static_get_residue_selector('<CDR name="L1" cdrs="L1"/>')
L1_res = L1_sele.apply(pose)
for i in range(1, len(L1_res)+1):
if L1_res[i]:
print("L1 Residue: ", pose.pdb_info().pose2pdb(i), ":", i )
Do these residues match what we had before? Why do both of these seem a bit slower? The actual residue selection is extremely quick, but validating the XML against a schema (which checks to make sure the string that you passed is valid and works) takes time.
And that's it! That should be everything you need to know about RosettaScripts in PyRosetta. Enjoy!
For XMLObjects, each type has a corresponding function (with and without static), these are listed below, but tab completion will help you here. As you have seen above, the static functions are called on the class type, XmlObjects
, while the non-static objects are called on an instance of the class after parsing a script, in our example, it was called xml
.
.get_score_function / .static_get_score_function
.get_residue_selector / .static_get_residue_selector
.get_simple_metric / .static_get_simple_metric
.get_filter / .static_get_filter
.get_mover / .static_get_mover
.get_task_operation / .static_get_task_operation