accueil

carte
-
recherche

aide

-

Heptane core

-

Heptane

Static WCET Analyzer

 

The instruction cache

The goal of the caches is to improve the speed of the transfers from/to memory. Since the address of each instruction is statically known (unlike many data references) taking into account the instruction fetching speed improvement due to the instruction cache is possible. The presence (or absence) of an instruction in the instruction cache impacts on the WCET of a basic block.

Intruction cache simulation

On the embedded Pentium processor, the instruction cache is organized as a 2-way set associative cache. There are 128 sets and each set contains 2 lines (a cache line is 32 bytes wide). The static simulation technique we used in heptane to estimate the instruction cache behavior is based on F. Mueller work. The method determines for each instruction reference whether it will result in a cache hit or if it may cause a miss during program execution.

Instruction cache simulation result formalism

Our formalism defines the cache worst case behavior at the level of instruction blocks. An instruction block is a portion of a basic block that fits exactly in a cache line. It may contain several instruction(s) and/or instruction fragment(s).

The proposed data structure provides for every instruction block information on its presence or absence in the cache according to the loop nesting hierarchy. As a basic block is made up of several instruction blocks, the proposed data structure, called IB-est (instruction block behavior estimation), is a set of (iblock, ln-level) couples.

 

The branch prediction mechanism

The branch prediction mechanism aims at improving the pipeline efficiency in the presence of CT-instructions. When a CT-instruction is met, a break in the flow of the instructions in the pipeline may occur, because the microprocessor does not know which instruction to execute until it has finished executing the CT-instruction. Modern processors incorporate a branch prediction mechanism which allows the processor to decode instructions beyond branches in order to keep the instruction pipeline full. This mechanism predicts whether a CT-instruction will be taken or not, and when predicted taken the target address of this CT-instruction.

Branch prediction simulation

In our context (Pentium processor) predictions are based on the Branch Target Buffer (BTB). The branch prediction mechanism makes predictions according to the branching history of CT-instructions. The technique we use to take into account branch prediction is the static BTB simulation.

Branch prediction simulation result formalism

The static branch prediction simulation aims at estimating which CT-instruction executions will be mispredicted. These estimations are ln-level dependent. Considering that there are two possibilities when a CT-instruction is executed (jumping and in sequence), each CT-instruction is associated with two potentially different results. The chosen data structure (namely CTI-est) is very similar to the previous one: two ln-levels are associated to every basic block, one per branching possibilities (a basic block contains at most one CT-instruction). The ln-level indicates which loop level execution will cause a conflict that may result in a CT-instruction misprediction.

Pipeline

The goal of pipelined execution is to increase the instruction throughput, i.e. the number of completed instructions per clock cycle. Pipelining allows various execution steps of instructions to simultaneously overlap. The estimation of the execution time of a basic block depends on the considered ln-level.

Adapting parameters for pipeline simulation

The static pipelined execution simulation estimates the worst case time needed for a basic block to be executed assuming a known context of execution. This context is made of IB-est and CTI-est of the basic block and a ln-level. These data structures are transformed by the adaptation layer to provide parameters of the PipeSim function. Its parameters are defined as follows.

Given a ln-level L, an IB-est set is split into two subsets. IBhit_L is the set of instruction blocks which are in the instruction cache when loop L is iterated, and IBmiss_L contains other instruction blocks of the basic block.
Given a ln-level (L) and a taken branch (jmp or seq), CTIdelay is computed from CTI-est. It is a boolean indicating whether a misprediction has to be simulated or not.

PipeSim(BBx,IBhit_L,IBmiss_L,CTIdelay)=time

Pipeline simulation (PipeSim)

The execution time of a sequence of instructions is estimated by modeling the five stages of the Pentium pipeline as a set of resources and representing each instruction as a process that acquires and consumes a subset of these resources. Information about the resource needs of the instructions is obtained through the retargetable System for Assembly Language Transformation and Optimization (Salto). The obtained reservation table is an intermediate timing information.

Pipeline simulation result formalism

The result of the PipeSim function is a timing information, that can be either relative (a clock cycle number) or absolute (seconds, micro-seconds, etc.). The timing information (results of PipeSim) associated with ln-level L are formalized by:

< time , L >.


WCET of basic blocks

The WCET of a basic block depends on the considered ln-level and branch execution. WCET_BBx represents the WCET of the basic block BB for all ln-levels assuming that branch x (x = seq or jmp) is executed. The WCET representation take instruction cache and branch prediction effects into account by using the PipeSim function. The WCET representation allows to express the WCET of a basic block for every particular ln-level context.

 

 

 
dernière mise à jour : 02 02 2001

-- english version --- acolin@irisa.fr --- ©copyright --


accueil
 

w3c-html4