Notebook

Transformations¶

Apart from modifying graphs manually using the SDFG API, you can also define more complex behavior that matches certain patterns in SDFGs and makes changes. This behavior is expressed in classes called Transformations. Transformations are a powerful tool to optimize applications in DaCe. You can go from naive code to state-of-the-art performance using only transformations.

There are two general types of transformations: pattern-matching transformations (extending the Transformation class) and subgraph transformations (extending the SubgraphTransformation class). The former is based on one or more specific subgraph expressions, and the latter can be applied to any subgraph that passes the conditions. Internally, there are two methods to a Transformation: can_be_applied, which returns True if it can be applied on a given subgraph; and apply, which modifies the graph using the SDFG API. Some transformations run automatically on each graph (as part of the simplification pass), and others have to be called manually.

You can find a list of the built-in standard transformations here. While these transformations cover many use-cases, they cannot cover everything (for example, domain-specific behavior). Thus, Transformations are easily extensible and below we show how to register a new one. You can register your transformations to be pattern-based, subgraph-based, or not, upon defining a new class.

This tutorial will deal with the different ways transformations can be applied on code:

Apply anywhere or everywhere
Enumerate matches of a transformation
Apply at a specific location
Interactive optimization (command-line, Visual Studio Code)
Create your own transformation

We will use the following example throughout this tutorial:

In [1]:

import dace
@dace.function
def dbladd(A: dace.float64[1000, 1000], B: dace.float64[1000, 1000]):
    dbl = B
    return A + dbl * B

Applying Transformations¶

The easiest way to apply a transformation is to import the Transformation subclass and call apply_transformations or apply_transformations_repeated on the SDFG. The methods accept a transformation or a list of transformations, their parameters, and other parameters (for more info, see its documentation).

To demonstrate transformations, we will first show the raw SDFG of dbladd, without simplification:

In [2]:

sdfg = dbladd.to_sdfg(simplify=False)
sdfg

Out[2]:

There are four states in this SDFG, and an unnecessary copy to dbl. In order to fuse the states, we can apply the StateFusion transformation:

In [3]:

from dace.transformation.interstate import StateFusion
sdfg.apply_transformations(StateFusion)
sdfg

Out[3]:

This fused the first two states. Since we want to fuse the entire graph, we can use the method that repeatedly applies the transformation until no more states can be fused without breaking the correctness of the graph:

In [4]:

sdfg.apply_transformations_repeated(StateFusion)
sdfg

Out[4]:

Now that the dataflow aspects of the graph are clearer, it is easy to see that the dbl array is redundant. One of the simplification pass transformations in the DaCe standard library deals with redundant array copies when one array is transient and unused anywhere else. To invoke this transformation, we can either import it directly, or simply try to apply all simplification transformations. Note that this happens automatically when a Python function is defined without our special simplify=False argument:

In [5]:

sdfg.simplify()
sdfg

Out[5]:

Enumerating Matches¶

As graphs grow larger, there will be more than one match to a transformation. For pattern matching transformations, we can enumerate the matching subgraphs in an SDFG for a given transformation using the dace.transformation.optimizer.Optimizer class.

In the example below, we try to find which maps can be tiled (to increase the locality of the computation) within the current SDFG:

In [6]:

from dace.transformation.dataflow import MapTiling
from dace.transformation.optimizer import Optimizer

for xform in Optimizer(sdfg).get_pattern_matches(patterns=[MapTiling]):
    print('Match:', xform.print_match(sdfg))

Match: MapTiling in [MapEntry (_Mult__map[__i0=0:1000, __i1=0:1000])]
Match: MapTiling in [MapEntry (_Add__map[__i0=0:1000, __i1=0:1000])]

The transformation can then be applied by calling xform.apply(graph, sdfg), where graph may be the SDFG state in which the pattern is found, or the SDFG (for multi-state transformations).

Custom Subgraph Enumeration¶

If you want to match a certain subgraph (in the SDFG or within a state) without creating a transformation, you can use the enumerate_matches API in dace.transformation.pattern_matching. It uses the same mechanism as transformations but can be used for general pattern matching:

In [7]:

from dace.transformation.passes.pattern_matching import enumerate_matches
from dace.sdfg import utils as sdutil  # For creating path graphs

# Construct subgraph pattern (MapExit -> AccessNode -> MapEntry)
pattern = sdutil.node_path_graph(dace.nodes.MapExit, dace.nodes.AccessNode,
                                 dace.nodes.MapEntry)

# Loop over matches
for subgraph in enumerate_matches(sdfg, pattern):
    print("Match found in state", subgraph.graph.label, ". Nodes:", subgraph.nodes())

Match found in state BinOp_5 . Nodes: [MapExit (_Mult__map[__i0=0:1000, __i1=0:1000]), AccessNode (__tmp0), MapEntry (_Add__map[__i0=0:1000, __i1=0:1000])]

Apply to Specific Location¶

We can also invoke transformations at specific locations using the apply_to static function of each transformation. In each transformation class, a list of statically constructed nodes define the transformation structure. For example, in MapFusion, there are three such nodes, called first_map_exit, array, and second_map_entry. If you know which maps to apply the transformation to, simply use these three names as keyword arguments to the apply_to function. Below is an example that finds the multiplication map exit, addition map entry, and the array between them:

In [8]:

# Since there is only one state (thanks to StateFusion), we can use the first one in the SDFG
state = sdfg.node(0)

# The multiplication map is called "_Mult__map" (see above graph), we can query it
mult_exit = next(n for n in state.nodes() if isinstance(n, dace.nodes.MapExit) and n.label == '_Mult__map')
# Same goes for the addition entry
add_entry = next(n for n in state.nodes() if isinstance(n, dace.nodes.MapEntry) and n.label == '_Add__map')
# Since all redundant arrays have been removed by simplification, we can get the only transient
# array that remains in the graph
transient = next(aname for aname, desc in sdfg.arrays.items() if desc.transient)
access_node = next(n for n in state.nodes() if isinstance(n, dace.nodes.AccessNode) and n.data == transient)

# We will apply the transformation on these three nodes
print(mult_exit, '->', access_node, '->', add_entry)

_Mult__map[__i0=0:1000, __i1=0:1000] -> __tmp0 -> _Add__map[__i0=0:1000, __i1=0:1000]

In [9]:

from dace.transformation.dataflow import MapFusion

MapFusion.apply_to(sdfg,
                   first_map_exit=mult_exit,
                   array=access_node,
                   second_map_entry=add_entry)
sdfg

Out[9]:

and now the two maps are fused.

Subgraph Transformations¶

The same can be applied with subgraph transformations, but a subgraph is required instead of a pattern. In the following example we reload the SDFG (in the default mode, namely with SDFG simplification), and then apply SubgraphFusion on the entire state (namely all of its nodes, or state.nodes()). The result will be similar to fusing the two maps as above, but can generalize to fuse any number of maps, parallel regions, and other cases.

In [10]:

from dace.transformation.subgraph import SubgraphFusion

sdfg = dbladd.to_sdfg()

# Single-state SDFG
state = sdfg.node(0)

SubgraphFusion.apply_to(sdfg, state.nodes())
sdfg

Out[10]:

Interactive Optimization¶

Sometimes it is useful to apply transformations in sequence interactively. To open the prompt from Jupyter or Python, call SDFGOptimizer:

In [11]:

from dace.transformation.optimizer import SDFGOptimizer
sdfg = SDFGOptimizer(sdfg).optimize()

0. Transformation ElementWiseArrayOperation in [MapEntry (outer_fused[__i0=0:1000, __i1=0:1000])]
1. Transformation ElementWiseArrayOperation2D in [MapEntry (outer_fused[__i0=0:1000, __i1=0:1000])]
2. Transformation FPGATransformSDFG in []
3. Transformation FPGATransformState in [SDFGState (BinOp_5)]
4. Transformation GPUTransformLocalStorage in outer_fused[__i0=0:1000, __i1=0:1000]
5. Transformation GPUTransformMap in outer_fused[__i0=0:1000, __i1=0:1000]
6. Transformation GPUTransformSDFG in []
7. Transformation MapDimShuffle in [MapEntry (outer_fused[__i0=0:1000, __i1=0:1000])]
8. Transformation MapExpansion in [MapEntry (outer_fused[__i0=0:1000, __i1=0:1000])]
9. Transformation MapFission in [MapEntry (outer_fused[__i0=0:1000, __i1=0:1000])]
10. Transformation MapTiling in [MapEntry (outer_fused[__i0=0:1000, __i1=0:1000])]
11. Transformation MapTilingWithOverlap in [MapEntry (outer_fused[__i0=0:1000, __i1=0:1000])]
12. Transformation MapUnroll in [MapEntry (outer_fused[__i0=0:1000, __i1=0:1000])]
13. Transformation NestSDFG in []
14. Transformation ReductionNOperation in [MapEntry (outer_fused[__i0=0:1000, __i1=0:1000])]
15. Transformation StripMining in outer_fused: ['__i0', '__i1']
16. Transformation TaskletFusion in [Tasklet (_Mult_), AccessNode (__tmp0), Tasklet (_Add_)]

You selected (MapExpansion$0) pattern MapExpansion in [MapEntry (outer_fused[__i0=0:1000, __i1=0:1000])] with parameters {}
0. Transformation ElementWiseArrayOperation in [MapEntry (outer_fused___i1[__i1=0:1000])]
1. Transformation FPGATransformSDFG in []
2. Transformation FPGATransformState in [SDFGState (BinOp_5)]
3. Transformation GPUGridStridedTiling in [MapEntry (outer_fused[__i0=0:1000]), MapEntry (outer_fused___i1[__i1=0:1000])]
4. Transformation GPUTransformLocalStorage in outer_fused[__i0=0:1000]
5. Transformation GPUTransformMap in outer_fused[__i0=0:1000]
6. Transformation GPUTransformMap in outer_fused___i1[__i1=0:1000]
7. Transformation GPUTransformSDFG in []
8. Transformation InLocalStorage in outer_fused[__i0=0:1000] -> outer_fused___i1[__i1=0:1000]
9. Transformation MPITransformMap in [MapEntry (outer_fused[__i0=0:1000])]
10. Transformation MPITransformMap in [MapEntry (outer_fused___i1[__i1=0:1000])]
11. Transformation MapDimShuffle in [MapEntry (outer_fused[__i0=0:1000])]
12. Transformation MapDimShuffle in [MapEntry (outer_fused___i1[__i1=0:1000])]
13. Transformation MapFission in [MapEntry (outer_fused___i1[__i1=0:1000])]
14. Transformation MapInterchange in [MapEntry (outer_fused[__i0=0:1000]), MapEntry (outer_fused___i1[__i1=0:1000])]
15. Transformation MapTiling in [MapEntry (outer_fused[__i0=0:1000])]
16. Transformation MapTiling in [MapEntry (outer_fused___i1[__i1=0:1000])]
17. Transformation MapTilingWithOverlap in [MapEntry (outer_fused[__i0=0:1000])]
18. Transformation MapTilingWithOverlap in [MapEntry (outer_fused___i1[__i1=0:1000])]
19. Transformation MapToForLoop in [MapEntry (outer_fused[__i0=0:1000])]
20. Transformation MapToForLoop in [MapEntry (outer_fused___i1[__i1=0:1000])]
21. Transformation MapUnroll in [MapEntry (outer_fused[__i0=0:1000])]
22. Transformation NestSDFG in []
23. Transformation OutLocalStorage in outer_fused___i1[__i1=0:1000] -> outer_fused[__i0=0:1000]
24. Transformation ReductionNOperation in [MapEntry (outer_fused___i1[__i1=0:1000])]
25. Transformation StripMining in outer_fused: ['__i0']
26. Transformation StripMining in outer_fused___i1: ['__i1']
27. Transformation TaskletFusion in [Tasklet (_Mult_), AccessNode (__tmp0), Tasklet (_Add_)]

You selected (MapTiling$0) pattern MapTiling in [MapEntry (outer_fused[__i0=0:1000])] with parameters {'tile_sizes': (128,)}
0. Transformation ElementWiseArrayOperation in [MapEntry (outer_fused___i1[__i1=0:1000])]
1. Transformation FPGATransformSDFG in []
2. Transformation FPGATransformState in [SDFGState (BinOp_5)]
3. Transformation GPUGridStridedTiling in [MapEntry (outer_fused[__i0=tile___i0:Min(999, tile___i0 + 127) + 1]), MapEntry (outer_fused___i1[__i1=0:1000])]
4. Transformation GPUGridStridedTiling in [MapEntry (outer_fused[tile___i0=0:1000:128]), MapEntry (outer_fused[__i0=tile___i0:Min(999, tile___i0 + 127) + 1])]
5. Transformation GPUTransformLocalStorage in outer_fused[tile___i0=0:1000:128]
6. Transformation GPUTransformMap in outer_fused[__i0=tile___i0:Min(999, tile___i0 + 127) + 1]
7. Transformation GPUTransformMap in outer_fused___i1[__i1=0:1000]
8. Transformation GPUTransformMap in outer_fused[tile___i0=0:1000:128]
9. Transformation GPUTransformSDFG in []
10. Transformation InLocalStorage in outer_fused[__i0=tile___i0:Min(999, tile___i0 + 127) + 1] -> outer_fused___i1[__i1=0:1000]
11. Transformation InLocalStorage in outer_fused[tile___i0=0:1000:128] -> outer_fused[__i0=tile___i0:Min(999, tile___i0 + 127) + 1]
12. Transformation MPITransformMap in [MapEntry (outer_fused[__i0=tile___i0:Min(999, tile___i0 + 127) + 1])]
13. Transformation MPITransformMap in [MapEntry (outer_fused___i1[__i1=0:1000])]
14. Transformation MPITransformMap in [MapEntry (outer_fused[tile___i0=0:1000:128])]
15. Transformation MapDimShuffle in [MapEntry (outer_fused[__i0=tile___i0:Min(999, tile___i0 + 127) + 1])]
16. Transformation MapDimShuffle in [MapEntry (outer_fused___i1[__i1=0:1000])]
17. Transformation MapDimShuffle in [MapEntry (outer_fused[tile___i0=0:1000:128])]
18. Transformation MapFission in [MapEntry (outer_fused___i1[__i1=0:1000])]
19. Transformation MapInterchange in [MapEntry (outer_fused[__i0=tile___i0:Min(999, tile___i0 + 127) + 1]), MapEntry (outer_fused___i1[__i1=0:1000])]
20. Transformation MapTiling in [MapEntry (outer_fused[__i0=tile___i0:Min(999, tile___i0 + 127) + 1])]
21. Transformation MapTiling in [MapEntry (outer_fused___i1[__i1=0:1000])]
22. Transformation MapTiling in [MapEntry (outer_fused[tile___i0=0:1000:128])]
23. Transformation MapTilingWithOverlap in [MapEntry (outer_fused[__i0=tile___i0:Min(999, tile___i0 + 127) + 1])]
24. Transformation MapTilingWithOverlap in [MapEntry (outer_fused___i1[__i1=0:1000])]
25. Transformation MapTilingWithOverlap in [MapEntry (outer_fused[tile___i0=0:1000:128])]
26. Transformation MapToForLoop in [MapEntry (outer_fused[__i0=tile___i0:Min(999, tile___i0 + 127) + 1])]
27. Transformation MapToForLoop in [MapEntry (outer_fused___i1[__i1=0:1000])]
28. Transformation MapToForLoop in [MapEntry (outer_fused[tile___i0=0:1000:128])]
29. Transformation MapUnroll in [MapEntry (outer_fused[tile___i0=0:1000:128])]
30. Transformation NestSDFG in []
31. Transformation OutLocalStorage in outer_fused[__i0=tile___i0:Min(999, tile___i0 + 127) + 1] -> outer_fused[tile___i0=0:1000:128]
32. Transformation OutLocalStorage in outer_fused___i1[__i1=0:1000] -> outer_fused[__i0=tile___i0:Min(999, tile___i0 + 127) + 1]
33. Transformation ReductionNOperation in [MapEntry (outer_fused___i1[__i1=0:1000])]
34. Transformation StripMining in outer_fused: ['__i0']
35. Transformation StripMining in outer_fused___i1: ['__i1']
36. Transformation StripMining in outer_fused: ['tile___i0']
37. Transformation TaskletFusion in [Tasklet (_Mult_), AccessNode (__tmp0), Tasklet (_Add_)]

You did not select a valid option. Quitting optimization ...

The prompt can be used with numbers (e.g., 7 for the 8th transformation) or names (MapExpansion$0 is the first occurrence of MapExpansion). If parameters are given, as in the above example, they are called as it was a Python dictionary: MapTiling$0(tile_sizes=(128,)).

If the "Enter" key is pressed, the SDFG is no longer transformed and the function returns with the resulting graph.

Internally, the default behavior calls the SDFG console optimizer (dace.transformation.optimizer.SDFGOptimizer). This can be modified in the configuration value optimizer.interface.

Command-line interface¶

The console interactive optimizer can also be called from the command line directly, through the sdfgcc tool:

sdfgcc --optimize path/to/file.sdfg

You could also trigger the command-line interface every time a @dace function is called. It can be done by configuring a call hook, for example by setting the environment variable DACE_call_hooks to dace.cli_optimize_on_call.

Example:

DACE_call_hooks=dace.cli_optimize_on_call python myfile.py

Visual Studio Code¶

An extension that integrates DaCe into VSCode can be used to interactively edit and transform SDFGs.

To install it, go to the VSCode Marketplace and download the DaCe plugin. Alternatively, open an .sdfg file in VSCode, or search for the extension directly.

Upon opening an SDFG file, the viewer will prompt for transformations in the "SDFG Optimization" pane. As you change the view (panning, zooming, collapsing nodes), the transformations under "Viewport" will change.

Selecting nodes (through single click, ctrl-click, or the box select mode at the top pane) will add transformations and matching subgraph transformations to the "Selection" pane. History appears at the bottom and saves as part of the SDFG file, which you can then use to revert and apply new transformations. See the example below:

vscode plugin

Creating New Transformations¶

Extending the standard transformations is easy. New transformations can be used for domain-specific optimizations, or simply wrapping expertise gathered over time. In pattern-based transformations, there are three main parts to an implementation: the pattern(s), the match function, and the replacement (apply) method. Subgraph transformations behave exactly the same, but without the pattern part.

A pattern-based transformation extends the PatternTransformation class, whereas subgraph transformation extends SubgraphTransformation from the same module. There are several helper functions in dace.transformation.helpers and dace.transformation.subgraph.helpers that may make your life easier while writing transformations, please check them out in the documentation.

In this section, we will make a new pattern-based transformation that takes care of the SDFG we made in the last section:

In [12]:

sdfg

Out[12]:

As you can see, there is an unnecessary __tmp1 transient array between the two tasklets, as a result of SubgraphFusion. We can make a "Redundant array between tasklets" transformation that checks for such cases, and removes that array if it is not used anywhere else, directly connecting the two tasklets instead.

Using the three parts, we can define a pattern-based transformation:

Pattern: Our pattern graph should be Tasklet -> AccessNode -> Tasklet, which is the minimal subgraph in which this occurs. Since the transformation matches within a state, we will register it as a single-state transformation by extending SingleStateTransformation.
Match: We should only match such a subgaph if the array (a) is only a scalar or size 1, (b) is transient (i.e., not defined outside this SDFG), and (c) never used again apart from between those two tasklets. Only this way can we guarantee the correctness of the transformed program.
Replacement: If the transformation is matched, we will remove the array and reconnect the tasklets together.

For the pattern part, we must construct node objects as static fields within the class (that must start with an underscore in order to not be recognized as properties). Some of them require arguments, but anything can be set there. Then, we should define an implementation of the expressions class method to state how those node objects should be connected.

The match and replacement parts are implemented by the can_be_applied class method, returning a boolean for a specific subgraph candidate, and the apply instance method, which modifies the given SDFG using the SDFG API.

We can now start implementing the transformation:

In [13]:

from dace.transformation import transformation as xf
from dace.sdfg import utils as sdutil
from dace import registry


class RedundantArrayTasklet(xf.SingleStateTransformation):
    """ Removes a redundant transient array if and only if it is between two tasklets. """

    # Pattern: define pattern nodes
    tasklet_a = xf.PatternNode(dace.nodes.Tasklet)
    array = xf.PatternNode(dace.nodes.AccessNode)
    tasklet_b = xf.PatternNode(dace.nodes.Tasklet)

    # This method returns a list of graphs that represent the pattern
    @classmethod
    def expressions(cls):
        # We have only one expression, and it is a path graph between the three nodes
        return [sdutil.node_path_graph(cls.tasklet_a, cls.array, cls.tasklet_b)]

    # Match function
    def can_be_applied(self, graph, expr_index, sdfg, permissive=False):
        # Getting the actual node in the graph
        array_node = self.array

        # Get data descriptor from SDFG registry and access node
        desc = sdfg.arrays[array_node.data]

        # Match (a): Check array size
        if desc.total_size != 1:
            return False

        # Match (b): Check if transient
        if not desc.transient:
            return False

        # Match (c): Check if used again in any state in this SDFG
        if len([node for state in sdfg.nodes()
                for node in state.nodes()
                if isinstance(node, dace.nodes.AccessNode)
                and node.data == array_node.data]) > 1:
            return False

        # Match (c): Check if any other tasklet uses this transient
        if graph.in_degree(array_node) + graph.out_degree(array_node) != 2:
            return False

        # Match found!
        return True

    # Replacement (note that this is a method of a transformation instance)
    def apply(self, graph, sdfg):
        # Query the pattern match for the actual node
        array_node = self.array

        # Get incoming and outgoing edges of the access node
        # (there are only one of each according to `can_be_applied`)
        in_edge = state.in_edges(array_node)[0]
        out_edge = state.out_edges(array_node)[0]

        # Construct a new edge between the two nodes, using the input/output
        # nodes and connectors
        state.add_edge(in_edge.src, in_edge.src_conn,
                       out_edge.dst, out_edge.dst_conn,
                       dace.Memlet())

        # Finally, remove the redundant array node from the graph
        state.remove_node(array_node)

Now that the transformation is implemented, we can check if it works:

In [14]:

sdfg.apply_transformations(RedundantArrayTasklet)

Out[14]:

As the return value states the number of times a transformation was applied, the transformation was applied once. Let's look at the graph:

In [15]:

sdfg

Out[15]:

The node is now removed.

Composing Transformations¶

Transformations can call other transformations to create powerful compositions and make verification easier. Simply use one of the APIs to invoke a transformation (such as apply_to) within the code of another transformation to do so.

Optimization and Customization¶

There are more methods you can implement to optimize transformations:

annotates_memlets: If the method returns True, memlets will not be propagated after transformation is applied (for performance and/or overriding default propagation behavior).
match_to_str: Returns a string-representation of the match to customize printout in the command-line interface.