anim les activités scientifiques  


formation par la recherche / formation doctorale / enseignement, stages / sujets de thèses


Sujet de thèse proposé à l'Irisa pour la rentrée 2001-2002


anim Hiding thousand cycles memory latency through L2 prefetching

Localisation : Irisa, Rennes

Equipe(s) : Caps

Responsable : André Seznec (tél. direct : 02 99 84 73 36, email :

The last 20 years trend in electonic industry indicates that microprocessor frequency will soon reach 10 Ghz while the access time of main memory will remain in a few tens of nanosecond. Therefore the gap between processor performance and main memory will continue to grow. The penalty for a data or instruction missing the on-die memory hierarchy will soon become in the order of thousands of instructions.

While there exists a class of application for which the working set moves very slowly and do not exhaust a L2 cache featuring a few megabytes, the memory gap represents the major obstacle for the performance increase on another class of applications. On many of these applications, the parameters of the applications run on a platform heavily depend on the performance of the platform itself, that is the user will run the largest workload he/she will be able to get executed. Hiding the memory gap for those applications is a major issue.

The common current solution to deal with the increasing gap between the main memory access time and processor performance is to add more and more L2 or even L3 cache space. In a very few years, constructors will be able to implement consequent multimegabytes of static memory on the same die as a very wide issue SMT superscalar processor or a multiprocessor, it would be natural to use it for L2 caches. However, enlarging the L2 cache leads to diminishing returns for most applications, unless the working set comes to fit in the L2 cache. To hide memory latencies on cache misses, prefetching techniques have also been proposed ( both hardware and software). Unfortunately, most of the currently proposed techniques are only efficient at hiding a few tens of instruction opportunities, i.e. hiding between a L1 cache hit and a L2 cache hit, but not the complete memory access time.

In this study, we propose to investigate a different way of prefetching in order to hide the L2 cache latency (i.e thousands of instructions slots). As hypothesis, we will state that very substantial storage space (i.e. several megabytes) can be used for allowing to implement prefetching structures, we will also state that very different granularity of prefetches can be used (i.e, from a cache line to a full physical page, but also a sparse set insides the page). Two dimensions will be first explored: reuse of the same memory static address access pattern (i.e. the same flow of addresses), determination of dynamic access pattern (.e.g. strides, lists, ..). The objective being to hide a thousand of cycle.



dernière mise à jour : 12.03.2001

-- english version --- --- ©copyright --