You are here

Design-space exploration of fault-tolerant multicores

Team and supervisors
Department / Team: 
DepartmentTeam
Team Web Site: 
https://team.inria.fr/cairn/
PhD Director
Olivier Sentieys
Co-director(s), co-supervisor(s)
Angeliki Kritikakou
Contact(s)
NameEmail addressPhone Number
Olivier Sentieys
olivier.sentieys@irisa.fr
0299847216
PhD subject
Abstract

The consumer market has shifted towards multicore architectures, since the clock speeds of the single processors could not be further increased due to power consumption and heat dissipation limits [4]. Multicores provide Space, Weight and Power reductions (SWaP) and massive computing capabilities compared with single core processors, while they can integrate diverse applications on the same platform [1]. However, the reduction of the transistors size with technologies at 28nm and below has led the multicores to become more and more sensible to the environmental impacts [2], such as ionizing, particle and high-energy electromagnetic radiation, extreme weather conditions, high temperature peaks and electromagnetic interferences. Such stimuli trigger violations on the system impacting the normal system functionality and creating faults during its operation [3]. To provide correct system functionality, the reliability of multicore architectures has become a very essential aspect. Several different fault tolerant approaches have been proposed in the literature to improve the system reliability. However, no general solution can exist to provide the required reliability in low cost for all the problems under study. The promising fault tolerant method is determined by the real faults occurring during execution, the application and the platform of each problem under study.

This PhD focuses on fault tolerant multi-core architectures and has as main goals: 1) to gain insight on the impact of faults on multicore architectures in order to model the impact of simple (SEU, SET) and multiple (MBU) errors at different levels of abstraction, and 2) to design and develop a novel method to explore the design space of the promising set of fault tolerant techniques.

During the first part of this thesis, we will study the impact of faults on the basic components of a multicore architecture, i.e. the memory, the core and the interconnection, based on a shared-memory multicore based on RISC-V cores specified at the C-level through high-level synthesis and designed with a 28nm technology. To achieve this, we require to develop models to describe the faulty behaviors of these components by raising the abstraction of the existing fault models on the gate level and up to the architecture level.

During the second part, we will define the set of relevant fault tolerant techniques within our domain and classify these methods into a binary classification scheme. Each of the classes will be characterized with respect to the reliability that they can offer and the overhead that they impose on the design (performance, area, energy). The different possible fault scenarios, based on the abstract models developed during the first part, will be mapped with the corresponding fault tolerant classes. In the next step we will focus on defining a novel design space exploration methodology and designing the corresponding tools in order to efficiently explore the different fault tolerance design options. The methodology will be based on pruning methods over the binary classification and optimizations strategies. The results of the proposed methodology are the set of the most promising fault tolerant approaches under given fault scenarios and platform characteristics that reduce the system cost, while providing reliability and real-time guarantees. A RISC-V multicore architecture will be used to perform the evaluation of the proposed methodology.

 

This Thesis is funded by a project involving INRIA, ONERA, and Temento Systems.

Bibliography
  1. F. Lemonnier, P. Millet, G. Marchesan Almeida, et al. “Towards future adaptive multiprocessor systems-on-chip: an innovative approach for flexible architectures,” in IC-SAMOS, 2012.
  2. D. Gizopoulos, M. Psarakis, S.V. Adve, et al. “Architectures for Online Error Detection and Recovery in Multicore Processors”, in DATE, 2011
  3. D. P. Siewiorek and P. Narasimhan, “Fault-tolerant architectures for space and avionics applications", technical report, Carnegie Mellon University, 2008.
  4. Di Carlo, P. Prinetto, D. Rolfo, and P. Trotta, “A fault injection methodology and infrastructure for fast single event upsets emulation on xilinx sram-based fpgas,” in 2014 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFT). IEEE, 2014, pp. 159–164.
  5. S. Jafri, J. Piestrak, S, O. Sentieys, and S. Pillement, “Design of the coarse-grained reconfigurable architecture DART with on-line error detection,” Microprocessors and Microsystems, vol. 38, pp. 124–136, Mar. 2014.       
  6. S. Jafri, S. J. Piestrak, O. Sentieys, and S. Pillement, “Design of a fault-tolerant coarse-grained reconfigurable architecture: A case study,” in Proc. of the 11th IEEE International Symposium on Quality Electronic Design (ISQED 2010), (San Diego, CA, USA), p. 6 pages, IEEE, Mar. 2010.
  7. M. Gatti, “Development and certification of avionics platforms on multi-core processors,” in Tutorial Mixed-Criticality Systems: Design and Certification Challenges, ESWeek, (Montreal, Canada), 2013.
  8. The RISC-V Instruction Set Architecture, http://riscv.org, 2016.
  9. R. Psiakis, A. Kritikakou and O. Sentieys, NEDA: NOP Exploitation with Dependency Awareness for Reliable VLIW Processors, IEEE Computer Society Annual Symposium on VLSI (ISVLSI), July 3-5, 2017.
  10. R. Psiakis, A. Kritikakou and O. Sentieys, Run-Time Instruction Replication for Permanent and Soft Error Mitigation in VLIW Processors, 15th IEEE Int. NEW Circuits and Systems Conference (NEWCAS), 2017.
Work start date: 
dès que possible / as soon as possible
Keywords: 
multicore architecture; fault tolerance;
Place: 
IRISA - Campus universitaire de Beaulieu, Rennes