Fault Tolerant Emerging On-Chip Interconnects

Publié le
Lieu
Lannion
Département
Equipe de recherche
Contexte
This postdoc position will be funded by the Rakes and AllOpticall2 ANR Projects.
These projects involve Inria Taran (Rennes/Lannion), INL (Lyon), Lab-STICC (Lorient), and TIMA (Grenoble).

The Taran team has already a strong background in on-chip interconnects, and on the emerging interconnect paradigms (WiNoC, ONoC) targeted in this project.
Mission

Subject
Since few years we are witnessing the emergence of manycore architectures, namely to the implementation of massive parallelism on a single chip. Associated with the shrinking size of the transistors, these manycore architectures should reach the integration of thousands of heterogeneous cores allowing huge parallel computation capabilities suitable for high-performance embedded computing systems and HPC. 
In the last decade, electrical Network-on-Chips (ENoCs) have emerged as an efficient solution for multicore architectures, in the range of tens of cores on a-chip, to circumvent the parallelism limitations of traditional buses. Nevertheless, as the manycore era progresses, ENoCs suffer from scalability in terms of latency and energy due to a huge increase on the number of hops between cores [1, 2], hence emerging technologies are called to supplement this traditional interconnect.
In parallel, technology evolution has allowed for the integration of silicon photonics and wireless communications on chip, thus leading to the Wireless Network-on-Chip (WiNoC) [3-4] and Optical Network-on-Chip (ONoC) [5-6] paradigms. These emerging technologies are showing significant advantages for broadcasting data (WiNoC) and low-latency communications (ONoC), whereas conventional Electrical Network-on-Chip (ENoC) is reaching its limit [1]. For future on-chip interconnect, it seems clear that the use of just one technology will lead to inefficient solutions, hence hybrid NoCs are using, which is a combination of two technologies. 
However, disruptive technologies suffer from higher variability due to still maturing fabrication process. This prevents from their deployment, which calls for optimization methods and dedicated fault-tolerant hardware designs to improve their robustness. Moreover, as we approach the limit of CMOS scaling, it becomes increasingly unlikely for a computing device to be fully functional due to various sources of faults, especially in harsh environment such as in space [7].

This call to provide fault tolerant techniques to enhance robustness or to limit fault impacts on error resilient applications, e.g. neural networks or approximate computing [8], that is executed on emerging on-chip interconnects. 

The scope of the PostDoc position is relatively open and applicants are expected to identify the direction that suits them the most as a function of their background and interest. The goal is to improve the fault-tolerance of emerging on-chip interconnects in the context of manycore architectures, and we seek to find systematic methods to answer the key questions:
•    How to analysis the reliability of emerging interconnect and to detect/localize faults?
•    How to improve the robustness of emerging interconnect?
•    How to continue to use faulty emerging on-chip interconnect for instance in the context of error tolerant applications?

Bibliography

[1]    A. Karkar et al. “A Survey of Emerging Interconnects for On-Chip Efficient Multicast and Broadcast in Many- Cores”. In: IEEE Circuits and Systems Magazine 16., pp. 58–72 , 2016.
[2]    W. Wolf, A. A. Jerraya and G. Martin. Multiprocessor System-on-Chip (MP- SoC) Technology. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 27, no. 10, pages 1701–1713, 2008. 
[3]    M. F. Chang, et al., “CMP Network-on-Chip Overlaid with Multi-Band RF-Interconnect,” Proc. of IEEE Int. symposium on High- Performance Computer Architecture (HPCA), pp. 191-202, 2008.
[4]    J.  Ortiz Sosa, O. Sentieys, C. Roland. A Diversity Scheme to Enhance the Reliability of Wireless NoC in Multipath Channel Environment. Twelfth IEEE/ACM International Symposium on Networks-on-Chip (NOCS), Oct 2018, Torino, Italy. pp.1-8
[5]    A. Shacham, K. Bergman, and L. P. Carloni. "Photonic networks-on-chip for future generations of chip multiprocessors." IEEE Transactions on Computers, vol. 57.9 pp: 1246-1260, 2008.
[6]    J. Luo, C. Killian, D. Chillet, S. Le Beux, I. OConnor, O. Sentieys. “Offline optimization of wavelength allocation and laser power in nanophotonic interconnects”. In: ACM Journal on Emerging Technologies in Computing Systems (JETC) (2018). 
[7]    Space Product Assurance: Techniques for Radiation Effects Mitigation in AASIC and FPGAs Handbook,” tech. rep., ESA Requirements and Standards Division, Sept. 2016.
[8]     C. Torres-Huitzil and B. Girau, “Fault and Error Tolerance in Neural Networks: A Review,” IEEE Access, vol. 5, pp. 17322–17341, 2017. 
 

Profil / Compétences
Expected profile of the candidates:
- PhD in Computer Science, Electrical or Computer Engineering
- Strong background in Fault Tolerance, multi/manycore architectures, on-chip interconnects
- Familiarity with manycore simulator is greatly appreciated.
- Programming experience, e.g., in C/C++ and Python.
- Good knowledge of computer architecture, hardware design, and embedded systems.

What is valued the most is autonomy. We expect the postdoc to be motivated and capable of composing short and mid-term objectives themselves.
Date prévisionnelle d'embauche
01/05/2023
Date limite de candidature
Candidater
Contacts and application: Submit a CV, a cover letter, recommendation letters, and any document that may help your application to

• Daniel CHILLET, daniel.chillet@irisa.fr
• Cédric KILLIAN, cedric.killian@irisa.fr
• Olivier SENTIEYS, olivier.sentieys@irisa.fr