Publications -- Abstracts

Full papers


Pattern and Approximate-Pattern Matching for Program Compaction

The application of pattern and approximate-pattern matching to program compaction offers considerable benefits in restructuring code as a means of reducing its size. We survey the field of code compaction and compression, considering code size, the effect on execution time, and additional resources, and present a literature survey identifying representative papers.

Concentrating on compaction, we discuss phase-ordering and the two main code compaction techniques -- code factoring and procedural abstraction. Relating to common subsequence computation we consider compact suffix trees, suffix arrays and suffix cacti. After introducing classical pattern matching, we review parameterized pattern matching, based on a modified suffix tree (the parameterized suffix tree).

We consider four supporting technologies: static single assignment (SSA) form simplifies data flow analysis; equivalence considers the issue of whether two sections of code are equivalent; normalization, which considers code structuring techniques to aid pattern matching; and register allocation, which we argue should be done after code compaction.


Technical Report: triVM Intermediate Language Reference Manual

The triVM intermediate language has been developed as part of a research programme concentrating on code space optimization. It is intended to be an experimental platform to support investigations into code analyses and transformations, providing a high-degree of flexibility and capability.

The primary aim in developing the triVM intermediate language is to provide a language that removes the complexity of high-level languages, such as C or ML, while maintaining sufficient detail, at as simple a level as possible, to support reseach and experimentation into code size optimization.

A secondary aim is to develop an intermediate language that supports graph-based translation systems, using graph rewrite rules, in a textual, human-readable format. Experience has shown that text-format intermediate files are much easier to use for experimentation, while the penalty in translating this human-readable form to the more compact binary data structure format used by the software is negligible.

Another secondary aim is to provide a flexible language in which features and innovations can be evaluated. For example, this is one of the first intermediate languages (as opposed to data structures) to directly support the Single Static Assignment property. Another feature is the exposing of condition codes as one of the results obtained from arithmetic operations.

The basic structure of triVM is a notional three-address machine, wherein the majority of arithmetic instructions take three registers---two supply the arguments and the result is placed in the third. This instruction format is somewhat similar to that of the ARM Thumb.


Combined Code Motion and Register Allocation using the Value State Dependence Graph

We define the Value State Dependence Graph (VSDG). The VSDG is a form of the Value Dependence Graph (VDG) extended by the addition of state dependence edges to model sequentialised computation. These express store dependencies and loop termination dependencies of the original program. We also exploit them to express the additional serialization inherent in producing final object code.

The central idea is that this latter serialization can be done incrementally so that we have a class of algorithms which effectively interleave register allocation and code motion, thereby avoiding a well-known phase-order problem in compilers. This class operates by first normalizing the VSDG during construction, to remove all duplicated computation, and then repeatedly choosing between:

  1. allocating a value to a register,
  2. spilling a value to memory,
  3. moving a loop-invariant computation within a loop to avoid register spillage, and
  4. statically duplicating a computation to avoid register spillage.

We show that the classical two-phase approach (code motion then register allocation in both Chow and Chaitin forms) are examples of this class, and propose a new algorithm based on depth-first cuts of the VSDG.


Using Multiple Memory Access Instructions for Reducing Code Size

An important issue in embedded systems design is the size of programs. As computing devices decrease in size, yet with more and more functions, better code size optimizations are in greater demand. For an embedded RISC processor, where the use of compact instructions (e.g., the ARM Thumb) restricts the number of accessible registers at the expense of a potential increase in spill code, a significant proportion of instructions load or store to memory.

In this paper we present a new technique which both identifies sequences of single load and store instructions for combining into multiple load and store instructions, and guides the layout of function stack frames, global storage and register allocation, previously only seemingly done by opportunistic optimization. We implement this in our SolveMMA algorithm, similar to Liao's Simple Offset Assignment algorithm.

We implement our algorithm within the Value State Dependence Graph framework, describe how the algorithm is specialized for specific processors, and use the ARM Thumb as a concrete example for analysis.