Sébastien HILY

Sébastien HILY got his Ph.D in the CAPS group in june 1997

Senior Researcher. Intel MRL IA32 Architecture Unit.

contact information

Intel Microcomputer Research Labs

Intel MRL
Mailstop: EY2-09
5350 NE Elam Young Parkway
Hillsboro, OR 97124-6461, USA

phone: (503) 696-3857
fax: (503) 696-1442

e-mail: shily@ichips.intel.com

research interests

microprocessors
computer architecture
simultaneous multithreading

research

Étude du parallélisme monolithique : cas du multiflot simultané

Doctoral Dissertation (in French) June 1997 : file.ps.gz (854904 bytes)

Several millions of transistors can already be integrated on a single circuit. On the other hand, the internal clock frequencies of microprocessors are increasing steadily. The gap between the internal clock (on the chip) and the external clock (on the motherboard) is continuously growing, leading to huge relative access time to the memory. To exploit these technological data, several forms of parallelism will have to be developed and integrated on the chip. Among the techniques dealing with instruction parallelism and thread parallelism, simultaneous multithreading (SMT) appears to be one of the most promising.

An SMT microprocessor allows the simultaneous execution of several instruction streams in a shared superscalar pipeline. The latter is thus used at its maximum. The simultaneous execution of several threads, however, implies new constraints at the architectural level which are important to examine in detail.

The work presented in this thesis allowed us to show that branch prediction tables can be shared by different threads, whether the workload is constituted of independant applications or of a unique parallel program. However, having a private return address stack per thread highly enhances the prediction accuracy. The memory hierarchy appeared to be a far more critical subject. It is the memory hierarchy's parameters which set up the maximum degree over which multithreading is no more cost-effective. In order to have the best performance, it is particularly important to have associative first level caches and small bloc sizes. However, the contention on the second level cache should limit the interest of multithreading to a few threads. Lastly, we show that with only 4 threads, an architecture featuring simultaneous multithreading can rely on a simple in-order execution. The performance gain brought by an out-of-order execution is indeed too weak to justify the implementation of complex mechanisms.

Branch prediction and simultaneous multithreading

Branch prediction strategies for superscalar architectures now achieve more than 90% accuracy. We explore the impact on the branch prediction accuracy of the simultaneous use of prediction tables by several threads. We particularly try to characterize whether or not the threads take advantage of sharing large prediction structures for multiprogramming processing as well as for parallel applications. We also examine the usefulness of providing one private Return Address Stack per active thread.

Related publication:

S. Hily, A. Seznec `` Branch Prediction and Simultaneous Multithreading'', 25 pages, IRISA Report No 997, March 1996. Short paper appeared in PACT'96, Boston MA.

Memory hierarchy and simultaneous multithreading

Simultaneous multithreading (SMT) is an interesting way of maximizing performance by enhancing processor utilization. We investigate issues involving the behavior of the memory hierarchy with SMT. First, we show that ignoring L2 cache contention leads to strongly over-estimate the performance one can expect and may lead to incorrect conclusions. We then explore the impact of various memory hierarchy parameters. We show that the number of supported threads has to be set-up according to the cache size, that the L1 caches have to be associative and small blocks have to be used. Then, the hardware constraints put on the design of memory hierarchies should limit the interest of SMT to a few threads.

Related publication:

S. Hily, A. Seznec "Standard Memory Hierarchy Does Not Fit Simultaneous Multithreading", Proceedings of MTEAC'98 Workshop (in conjunction with HPCA 4) , Feb. 1998
A longer version is available as `` Contention on 2nd Level Cache May Limit The Effectiveness of Simultaneous Multithreading", 22 pages, IRISA Report No 1086, Feb. 1997

In-order and out-of-order SMT models

Simultaneous multithreading (SMT) is a promising approach to deliver high throughput from superscalar pipelines. In this paper, we show that when executing 4 threads on an SMT processor, out-of-order execution induces small performance benefits over in-order execution. Then, for application domains where performance throughput is more important than ultimate performance on a single application, SMT combined with in-order execution may be a more cost-effective alternative than ultimate aggressive out-of-order superscalar processors or out-of-order execution SMT.

S. Hily, A. Seznec " Out-Of-Order Execution May Not Be Cost-Effective on Processors Featuring Simultaneous Multithreading ", IRISA Report No 1179, March 1998

teachings

All my teaching were for computer science students at the "Institut de Formation Supérieure en Informatique et Communication" IFSIC/Université de Rennes1

1993-1996: Lecturer
1996-1997: ATER. Attaché Temporaire d'Enseignement et de Recherche. Equiv. to visiting assistant professor (6 contact hours/week)

Computer Architecture (DIIC2 : ARA1, ARA2)
Compilation technics (DIIC2 : CPL1; DIIC3 : CPL2)
Algorithms (DESSDC : ALG)

other

PROJET R.I.S.C. Master thesis (1991)

LE COPROCESSEUR OPAC : algorithmes compute-bound, architecture, micro-instructions.

Rapport

Évaluation de cache. Rapport DRET-INRIA No 93.082 (1993)

last update: 31 01 2000
	pas de version française		Ronan.Amicel@irisa.fr		©copyright