The instruction cache
The goal of the caches is to improve the speed of the transfers
from/to memory. Since the address of each instruction is statically
known (unlike many data references) taking into account the instruction
fetching speed improvement due to the instruction cache is possible.
The presence (or absence) of an instruction in the instruction cache
impacts on the WCET of a basic block.
Intruction cache simulation
On the embedded Pentium processor, the instruction cache is organized
as a 2-way set associative cache. There are 128 sets and each set
contains 2 lines (a cache line is 32 bytes wide). The static simulation
technique we used in heptane to estimate the instruction cache behavior
is based on F.
Mueller work. The method determines for each instruction reference
whether it will result in a cache hit or if it may cause a miss
during program execution.
Instruction cache simulation result
formalism
Our formalism defines the cache worst case behavior
at the level of instruction blocks. An instruction block is a portion
of a basic block that fits exactly in a cache line. It may contain
several instruction(s) and/or instruction fragment(s).

The proposed data structure provides for every
instruction block information on its presence or absence in the
cache according to the loop nesting
hierarchy. As a basic block is made up of several instruction
blocks, the proposed data structure, called IB-est
(instruction block behavior estimation), is a set of (iblock, ln-level)
couples.
The branch prediction mechanism
The branch prediction mechanism aims at improving the pipeline
efficiency in the presence of CT-instructions.
When a CT-instruction is met, a break in the flow of the instructions
in the pipeline may occur, because the microprocessor does not know
which instruction to execute until it has finished executing the
CT-instruction. Modern processors incorporate a branch prediction
mechanism which allows the processor to decode instructions beyond
branches in order to keep the instruction pipeline full. This mechanism
predicts whether a CT-instruction will be taken or not, and when
predicted taken the target address of this CT-instruction.
Branch prediction simulation
In our context (Pentium processor) predictions are based on the
Branch Target Buffer (BTB). The branch prediction mechanism makes
predictions according to the branching history of CT-instructions.
The technique we use to take into account branch prediction is the
static BTB simulation.
Branch prediction simulation result formalism
The static branch prediction simulation aims at estimating which
CT-instruction executions will be mispredicted. These estimations
are ln-level dependent. Considering that there are two possibilities
when a CT-instruction is executed (jumping and in sequence), each
CT-instruction is associated with two potentially different results.
The chosen data structure (namely CTI-est) is very
similar to the previous one: two ln-levels are associated to every
basic block, one per branching possibilities (a basic block contains
at most one CT-instruction). The ln-level indicates which loop level
execution will cause a conflict that may result in a CT-instruction
misprediction.
Pipeline
The goal of pipelined execution is to increase the instruction
throughput, i.e. the number of completed instructions per clock
cycle. Pipelining allows various execution steps of instructions
to simultaneously overlap. The estimation of the execution time
of a basic block depends on the considered ln-level.
Adapting parameters for pipeline simulation
The static pipelined execution simulation estimates the worst case
time needed for a basic block to be executed assuming a known context
of execution. This context is made of IB-est
and CTI-est of the basic
block and a ln-level. These data structures are transformed by the
adaptation layer to provide parameters of the PipeSim function.
Its parameters are defined as follows.
Given a ln-level L, an IB-est set is split into two
subsets. IBhit_L is the set of instruction blocks
which are in the instruction cache when loop L is iterated, and
IBmiss_L contains other instruction blocks of the
basic block.
Given a ln-level (L) and a taken branch (jmp or seq),
CTIdelay is computed from CTI-est.
It is a boolean indicating whether a misprediction has to be simulated
or not.
PipeSim(BBx,IBhit_L,IBmiss_L,CTIdelay)=time
Pipeline simulation (PipeSim)
The execution time of a sequence of instructions is estimated by
modeling the five stages of the Pentium pipeline as a set of resources
and representing each instruction as a process that acquires and
consumes a subset of these resources. Information about the resource
needs of the instructions is obtained through the retargetable System
for Assembly Language Transformation and Optimization (Salto). The
obtained reservation table is an intermediate timing information.

Pipeline simulation result formalism
The result of the PipeSim function is a timing information,
that can be either relative (a clock cycle number) or absolute (seconds,
micro-seconds, etc.). The timing information (results of PipeSim)
associated with ln-level L are formalized by:
< time , L >.
WCET of basic blocks
The WCET of a basic block depends on the considered
ln-level and branch execution. WCET_BBx represents the WCET of the
basic block BB for all ln-levels assuming that branch x (x = seq
or jmp) is executed. The WCET representation take instruction
cache and branch prediction effects into account by using the PipeSim
function. The WCET representation allows to express the WCET
of a basic block for every particular ln-level context.
|