After IR trees have been put into a canonical form (described in the
previous section), they are used in generating assembly code. The
obvious way to do this is to macroexpand each IR tree node. For
is macroexpanded into the pseudo-assembly code at the right column:
TEMP(fp) t1 := fp CONST(10) t2 := 10 +(TEMP(fp),CONST(10)) t3 := t1+t2 CONST(3) t4 := 3 MOVE(MEM(...),CONST(3)) M[t3] := t4where ti stands for a temporary variable. This method generates very poor quality code. For example, the IR above can be done using only one instruction in most architectures:
M[fp+10] := 3.
A method, called maximum munch, generates better code, especially for RISC machines. The idea is to use tree pattern matching to map a tree pattern (a fragment of an IR tree) into a list of assembly instructions. These tree patterns are called tiles. For RISC we always have one-to-one mapping (one tile to one assembly instruction). Note that for RISC machines the tiles are small (very few number of IR nodes) but for CISC machines the tiles are usually large since the CISC instructions are very complex.
The following is the mapping of some tiles (left) into MIPS code (right):
CONST(c) li 'd0, c +(e0,e1) add 'd0, 's0, 's1 +(e0,CONST(c)) add 'd0, 's0, c *(e0,e1) mult 'd0, 's0, 's1 *(e0,CONST(2^k)) sll 'd0, 's0, k MEM(e0) lw 'd0, ('s0) MEM(+(e0,CONST(c))) lw 'd0, c('s0) MOVE(MEM(e0),e1) sw 's1, ('s0) MOVE(MEM(+(e0,CONST(c))),e1) sw 's1, c('s0) JUMP(NAME(X)) b X JUMP(e0) jr 's0 LABEL(X) X: nopHere
e1are subtrees of these IR tree nodes. Most of the assembly instructions above have one destination register
d0and a number of source registers,
s1. So if a tree pattern, such as
MOVE(MEM(e0),e1), has two input subtrees
e1, then their values are taken from the source registers
s1(which are the destinations of
e1). The quote in the assembly instructions is used to indicate that temporary registers should be selected so that there is this match of source-destination names.
To translate an IR tree into assembly code, we perform a tiling:
we cover the IR tree with non-overlapping tiles.
We can see that there are many different tilings.
Consider for example the tiger statement
a[i] := x, where
i is a temporary register, the
a address is stored at the
offset 20, and
x at the offset 10. The IR form is:
MOVE(MEM(+(MEM(+(fp,CONST(20))),*(TEMP(i),CONST(4)))),MEM(+(fp,CONST(10))))The following are two possible tilings of the IR:
lw r1, 20($fp) add r1, $fp, 20 sll r2, r1, 2 lw r1, (r1) add r1, r1, r2 sll r2, r1, 2 lw r2, 10($fp) add r1, r1, r2 sw r2, (r1) add r2, $fp, x lw r2, (r2) sw r2, (r1)The first tiling is obviously better since it can be executed faster.
It's highly desirable to do optimum tiling, ie, to generate to shortest instruction sequence (or alternatively the sequence with the fewest machine cycles). But this is not always easy to achieve. So most compilers do an optimal tiling where no two adjacent tiles can be combined into a tile of lower cost. Optimum tiling is not always optimal but it's close for RISC machines.
There are two main ways of performing optimum tiling: using maximal munch (a greedy algorithm) or using dynamic programming. In maximal munch you start from the IR root and from all matching tiles you select the one with the maximum number of IR nodes (largest tile). Then you go to the children (subtrees) of this tile and apply the algorithm recursively until you reach the tree leaves. The dynamic programming works from the leaves to the root. It assigns a cost to every tree node by considering every tile that matches the node and calculating the minimum value of:
For example, consider the following IR represented as a tree and its tiling using maximal munch:
The order of tiling is IEHBDGACF (ie, top-down).
After we set the tiles, we use a different register
ri for each connection
between the tiles. Then we expand the tiles into MIPS code by following the tile mapping table.
The order of tile expansion is ABCDEFGHI (ie, bottom-up).
The MIPS code is:
A lw r1, fp B lw r2, 8(r1) C lw r3, i D sll r4, r3, 2 E add r5, r2, r4 F lw r6, fp G lw r7, 16(r6) H add r8, r7, 1 I sw r8, (r5)