Instrumentation is a way to obtain precise information about the behavior of an application. Typical interesting data include execution frequency of basic blocks to focus optimization on critical parts of a program, or addresses of data to perform a cache simulation. This kind of code transformation is easily achieved with Salto and has already permitted many studies in different fields like branch prediction, code layout or data layout
List scheduling is a simple low-level optimization that can be applied on every basic block in a program.
Every information about the code that is statically available can be extracted and displayed. Algorithms are usually written in a straightforward manner. In this example we extract the loop structure of a program.
SEA is the name of a new research project in the field of compilation. This project aims to get rid of two major problems encountered when compiling programs:
SEA implements a strong bidirectionnal link between the code generator and the optimization process in Salto via a simple language called IL which aims at maintaining a map between high level statements and low level pieces of code. High level information is made available to the optimizer which can give feedback to the code generator about the performance achieved. For instance the optimizer can request that the loop is unrolled less times to avoid a high register pressure.
- the loss of information between the successive phases of compilation, for instance the memory accesses disambiguation is evident when comparing A[i] and A[i+1], but much more difficult when the optimizer is faced to assembly instructions;
- the built-in one-way compilation scheme that is known to have good results in the majority of cases, but cannot deal with pathological cases which yield to poor results.
The compiler posseses a set of transformations. Each one is able to modify a piece of code and to measure its impact. It must also have the possibility to undo its work and to transmit information about its behavior with respect to a particular piece of code.
A significant part of this work is integrated in the Esprit LTR OCEANS project.
In order to simulate microprocessor architectures or memory hierarchies, realistic program traces are often used. Software trace collection induces significant execution slowdown. Moreover, tracing large applications results in enormous trace files. On-line tracing and simulation addresses this latter problem but is also very time consuming and cannot be realistically envisaged for large applications incurring trillions of instructions. Simulating such applications (even if complete traces were collected) require sampling.
An interesting approach consists in sampling the execution thanks to the duplication of the source code (assembly-language level). One version of the code is nearly identical to the original code and will execute most of the time. A second version of the code is further instrumented by the user and parcimoniously executed in order to get trace without slowing down too much the overall application. At execution time, some events make the execution to be switched to the instrumented version of code. Other events make the execution to return to the original-like version of code. The choice of the events determine the execution overhead therefore making the approach very flexible.
We implemented a sub-optimized prototype with Salto to validate this approach. The results we obtained with the SPECInt95 benchmarks shows an execution overhead of 2 - 25 % running light versions of code (without user instrumentation). The same applications, running an entirely instrumented version, and collecting about 1 % of the overall trace (counting memory references in a straightforward way), had an execution slowdown of 25 - 85 %.
Programmable processors are playing an increasingly important role in the design of embedded systems. Software running on these processors provides flexibility and facilitates the reuse of existing hardware programmable components. However, architectural mechanisms of many domain specific processors prevent from using classical high-level language compilers.
Our project aims at reducing the gap between application specific processors and code generation tools by using software compilation tools during the processor architecture design process. This design process is iterative and at each step, parts of the application are compiled on the hypothetic architecture. Results are then used to update the processor architecture (for example, by modification of the instruction set). The final result should be a performing application oriented architecture and an associated set of code generation tools.
In such a process, compilation tools must be easily adaptable to different processor architectures. To this end, we are currently developping a framework for simple compilers design, based on a processor model and on several flexible code generation tools. Salto is the low-level component of this framework. Higher level parts produce sequential object code, and Salto is used to apply optimizations, such as compaction or software pipelining. In our approach, the machine description of the target architecture is automatically generated from our high level processor model.
PiLo and LoRA are optimization kernels focusing on loops in programs. They are based on periodic scheduling and graph coloring algorithms. The target architecture is described through static reservation tables and counts of available registers in each register file. Code is represented by its data dependence graph. Salto is now interfaced with PiLo and LoRA : Salto computes data dependencies between instructions, sends the architectural information and rebuilds the new code according to the resulting schedule and register allocation scheme.