Compiler support for speculative hardware acceleration
Because of their unrivaled energy efficiency, FPGA accelerators are making their way into data centers and cloud infrastructures. However, despite their advantages, there is a long way to go before FPGA technology can be qualified as mainstream computing technology. One of the reasons is that since FPGAs operate as programmable logic, FPGA based accelerators have to be “designed” as circuits, rather than “programmed” as traditional computers.
The goal of High Level Synthesis tools is to address this issue, by allowing programmers to automatically derive high hardware accelerators from an algorithmic specification in C/C++. These tools have shown to improve design productivity by one order of magnitude, and are being used by a rapidly growing number of designers and companies around the world.
Many of the most important hardware optimizations for FPGA (e.g., wordlength reduction, and automatic parallelization) need accurate compile-time information on the target program to derive efficient and fast accelerators. This information is usually obtained through static analyses and/or source code annotations. Because they operate at compile-time, the information gathered may not be accurate (i.e., conservative) and lead to inefficient hardware. This is especially true for target kernels with dynamic and or unpredictable behavior [4,5].
Optimizations based on speculations are an attractive approach for such programs that can bring significant performance benefits at (often) marginal area cost [3, 6]. However, selecting and implementing a good speculation strategy is often challenging, since the outcome of speculative optimizations is a combination of many subtle trade-offs. As a consequence, the designer would like to be able to quickly explore different approaches to choose the most efficient one. As of today, automating such transformation in the context of High Level Synthesis tools represents an open challenge.
The goal of this thesis is to therefore to study what and how hardware/software speculation mechanisms can be implemented in state-of-the-art FPGA accelerator design tools. In contrast to prior work [5,6], this thesis will focus on algorithm-level approaches, by building on the large body of existing work on algorithm-level error detection [1,2], and combining them with advanced loop transformation techniques. Our approach will be based on source-to-source compilation, and will target state of the art platforms such as the Zynq ultrascale and Intel Xeon devices.