Next: 11 Liveness Analysis Up: CSE 5317/4305: Design and Previous: 9 Basic Blocks and Contents

10 Instruction Selection

After IR trees have been put into a canonical form (described in the previous section), they are used in generating assembly code. The obvious way to do this is to macroexpand each IR tree node. For example, MOVE(MEM(+(TEMP(fp),CONST(10))),CONST(3)) is macroexpanded into the pseudo-assembly code at the right column:

TEMP(fp)                                t1 := fp
CONST(10)                               t2 := 10
+(TEMP(fp),CONST(10))                   t3 := t1+t2
CONST(3)                                t4 := 3
MOVE(MEM(...),CONST(3))                 M[t3] := t4

where ti stands for a temporary variable. This method generates very poor quality code. For example, the IR above can be done using only one instruction in most architectures: M[fp+10] := 3.

A method, called maximum munch, generates better code, especially for RISC machines. The idea is to use tree pattern matching to map a tree pattern (a fragment of an IR tree) into a list of assembly instructions. These tree patterns are called tiles. For RISC we always have one-to-one mapping (one tile to one assembly instruction). Note that for RISC machines the tiles are small (very few number of IR nodes) but for CISC machines the tiles are usually large since the CISC instructions are very complex.

The following is the mapping of some tiles (left) into MIPS code (right):

CONST(c)                                li   'd0, c
+(e0,e1)                                add  'd0, 's0, 's1
+(e0,CONST(c))                          add  'd0, 's0, c
*(e0,e1)                                mult 'd0, 's0, 's1
*(e0,CONST(2^k))                        sll  'd0, 's0, k
MEM(e0)                                 lw   'd0, ('s0)
MEM(+(e0,CONST(c)))                     lw   'd0, c('s0)
MOVE(MEM(e0),e1)                        sw   's1, ('s0)
MOVE(MEM(+(e0,CONST(c))),e1)            sw   's1, c('s0)
JUMP(NAME(X))                           b    X
JUMP(e0)                                jr   's0
LABEL(X)                           X:   nop

Here e0 and e1 are subtrees of these IR tree nodes. Most of the assembly instructions above have one destination register d0 and a number of source registers, s0 and s1. So if a tree pattern, such as MOVE(MEM(e0),e1), has two input subtrees e0 and e1, then their values are taken from the source registers s0 and s1 (which are the destinations of e0 and e1). The quote in the assembly instructions is used to indicate that temporary registers should be selected so that there is this match of source-destination names.

To translate an IR tree into assembly code, we perform a tiling: we cover the IR tree with non-overlapping tiles. We can see that there are many different tilings. Consider for example the tiger statement a[i] := x, where i is a temporary register, the a address is stored at the offset 20, and x at the offset 10. The IR form is:

MOVE(MEM(+(MEM(+(fp,CONST(20))),*(TEMP(i),CONST(4)))),MEM(+(fp,CONST(10))))

The following are two possible tilings of the IR:

   lw   r1, 20($fp)                     add  r1, $fp, 20
   sll  r2, r1, 2                       lw   r1, (r1)
   add  r1, r1, r2                      sll  r2, r1, 2
   lw   r2, 10($fp)                     add  r1, r1, r2
   sw   r2, (r1)                        add  r2, $fp, x
                                        lw   r2, (r2)
                                        sw   r2, (r1)

The first tiling is obviously better since it can be executed faster.

It's highly desirable to do optimum tiling, ie, to generate the shortest instruction sequence (or alternatively the sequence with the fewest machine cycles). But this is not always easy to achieve. So most compilers do an optimal tiling where no two adjacent tiles can be combined into a tile of lower cost. Optimum tiling is not always optimal but it's close for RISC machines.

There are two main ways of performing optimum tiling: using maximal munch (a greedy algorithm) or using dynamic programming. In maximal munch you start from the IR root and from all matching tiles you select the one with the maximum number of IR nodes (largest tile). Then you go to the children (subtrees) of this tile and apply the algorithm recursively until you reach the tree leaves. The dynamic programming works from the leaves to the root. It assigns a cost to every tree node by considering every tile that matches the node and calculating the minimum value of:

cost of a node = (number of nodes in the tile) + (total costs of all the tile children)

(the costs of the tile children are known since the algorithm is bottom-up).

For example, consider the following IR represented as a tree and its tiling using maximal munch:

The order of tiling is IEHBDGACF (ie, top-down). After we set the tiles, we use a different register ri for each connection between the tiles. Then we expand the tiles into MIPS code by following the tile mapping table. The order of tile expansion is ABCDEFGHI (ie, bottom-up). The MIPS code is:

A      lw   r1, fp
B      lw   r2, 8(r1)
C      lw   r3, i
D      sll  r4, r3, 2
E      add  r5, r2, r4
F      lw   r6, fp
G      lw   r7, 16(r6)
H      add  r8, r7, 1
I      sw   r8, (r5)

Next: 11 Liveness Analysis Up: CSE 5317/4305: Design and Previous: 9 Basic Blocks and Contents

2015-01-20