Pattern and Approximate-Pattern Matching for Program Compaction
- The application of pattern and approximate-pattern matching
to program compaction offers considerable benefits in restructuring code
as a means of reducing its size. We survey the field of code compaction
and compression, considering code size, the effect on execution time, and
additional resources, and present a literature survey identifying representative
Concentrating on compaction, we discuss phase-ordering
and the two main code compaction techniques -- code factoring and procedural
abstraction. Relating to common subsequence computation we consider compact
suffix trees, suffix arrays and suffix cacti. After introducing classical
pattern matching, we review parameterized pattern matching, based on a modified
suffix tree (the parameterized suffix tree).
We consider four supporting technologies: static single
assignment (SSA) form simplifies data flow analysis; equivalence considers
the issue of whether two sections of code are equivalent; normalization,
which considers code structuring techniques to aid pattern matching; and
register allocation, which we argue should be done after code compaction.
Technical Report: triVM Intermediate Language
- The triVM intermediate language has been developed
as part of a research programme concentrating on code space optimization.
It is intended to be an experimental platform to support investigations
into code analyses and transformations, providing a high-degree of flexibility
The primary aim in developing the triVM intermediate
language is to provide a language that removes the complexity of high-level
languages, such as C or ML, while maintaining sufficient detail, at as simple
a level as possible, to support reseach and experimentation into code size
A secondary aim is to develop an intermediate language
that supports graph-based translation systems, using graph rewrite rules,
in a textual, human-readable format. Experience has shown that text-format
intermediate files are much easier to use for experimentation, while the
penalty in translating this human-readable form to the more compact binary
data structure format used by the software is negligible.
Another secondary aim is to provide a flexible language
in which features and innovations can be evaluated. For example, this is
one of the first intermediate languages (as opposed to data structures)
to directly support the Single Static Assignment property. Another feature
is the exposing of condition codes as one of the results obtained from arithmetic
The basic structure of triVM is a notional
three-address machine, wherein the majority of arithmetic instructions take
three registers---two supply the arguments and the result is placed in the
third. This instruction format is somewhat similar to that of the ARM Thumb.
Combined Code Motion and Register Allocation using the
Value State Dependence Graph
- We define the Value State Dependence Graph (VSDG). The VSDG is a form of the
Value Dependence Graph (VDG) extended by the addition of
state dependence edges to model sequentialised computation.
These express store dependencies and loop
termination dependencies of the original program.
We also exploit them to express the additional serialization inherent
in producing final object code.
The central idea is that this latter serialization
can be done incrementally so that we have a class
of algorithms which effectively interleave register allocation and code motion,
thereby avoiding a well-known phase-order problem in compilers. This class
operates by first normalizing the VSDG during construction, to remove all
duplicated computation, and then repeatedly choosing between:
- allocating a value to a register,
- spilling a value to memory,
- moving a loop-invariant computation within a loop
to avoid register spillage, and
- statically duplicating a computation
to avoid register spillage.
We show that the classical two-phase approach (code motion then register
allocation in both Chow and Chaitin forms) are examples of this class,
and propose a new algorithm based on depth-first cuts of the VSDG.
Using Multiple Memory Access Instructions for Reducing Code Size
- An important issue in embedded systems design is the size of programs.
As computing devices decrease in size, yet with more and more functions, better code size
optimizations are in greater demand.
For an embedded RISC processor, where the use of compact instructions
(e.g., the ARM Thumb) restricts the number of accessible registers at
the expense of a potential increase in spill code, a significant proportion of
instructions load or store to memory.
In this paper we present a new technique which both identifies sequences of single load and
store instructions for combining into multiple load and store instructions, and
guides the layout of function stack frames, global storage and register
allocation, previously only seemingly done by opportunistic optimization.
We implement this in our SolveMMA algorithm, similar to Liao's Simple
Offset Assignment algorithm.
We implement our algorithm within the Value State Dependence Graph framework,
describe how the algorithm is specialized for specific processors, and use the
ARM Thumb as a concrete example for analysis.