TSS 2025 Lectures
Theme: Resilient and Reliable Systems
TSS will kick off on Friday at 8 PM with an introductory session, providing an overview of the program and an opportunity to get to know each other, followed by dinner. The courses will run from Saturday morning through Monday afternoon, with the final session taking place at the ETS venue. A transfer will be provided from TSS to ETS on Monday at noon.
In addition to participation in the school, TSS registration includes hotel lodging with SPA access from Friday to Monday, all meals, and social activities.
TSS 2025 focuses on reliability and safety, aiming towards the analysis and design of resilient and reliable ICs and systems. Basic concepts as well as advanced topics for AI/ML accelerators and emerging applications will be presented.
The Test Spring School 2025 will be held at LaSpa resort in Laulasmaa, which is located on a beach 40 km west of Tallinn Airport: https://www.laspa.ee. Bus transfers between the Conference hotel/Tallinn Airport to the TSS location will be available.
Lectures
1. Lifecycle Management of CMOS- and Emerging Technology-based Architectures: Why and How?
Leticia Bolzani Poelhs, IHP - Leibniz Institute for High Performance Microelectronics
Bio: Leticia Bolzani Poelhs graduated in Computer Science at the Federal University of Pelotas (UFPel), Brazil in 2002. In the year 2004, she received her M.Sc. Degree in Electrical Engineering at Pontifical Catholic University of Rio Grande do Sul (PUCRS), Brazil, and in 2008 her Ph.D. in Computer Engineering from the Politecnico di Torino, Italy. From 2010 to 2022 she was Professor of the School of Technology at PUCRS and from 2019 to 2024 she leaded the Research Group of Test and Reliability of Emerging Applications at the Chair of Integrated Digital Systems, RWTH Aachen University, Germany. From January 2025, she is leading the Research Group of Neuromorphic Hardware at IHP – Leibniz Institute for High Performance Microelectronics, Germany. She is member of the Steering Committee of IEEE LATS. She served as track chair and program committee member for several conferences, such as DATE, ETS, DDECS, DTTIS, VTS, VLSI-SoC, etc. Finally, she received the 2021 JETTA-TTTC Best Paper Award, the IEEE Latin American Test Symposium (LATS2022) Best Paper Award and the HiPEAC 2023 Paper Award for the paper at Design Automation Conference (DAC2023).
Abstract: The always increasing integration level of very complex CMOS- and emerging technology-based circuits in heterogeneous architectures requires a holistic approach to properly address all quality and reliability issues. In more detail, state-of-the-art architectures developed for implementing high performance applications are being implemented using not only CMOS technology, but also emerging technologies, such as memristive devices. Memristive devices can assume at least two different resistive states, being able to implement not only memory elements, but also computing elements. Different types of memristive devices, classified according to their switching mode, conductive path and working mechanism. In this context, Resistive Random-Access Memories (RRAMs) represent one of the most promising candidates to complement and/or replace CMOS technology.
RRAMs are being explored to replace charge-based memories, such as Dynamic RAMs (DRAMs) and flash. In more detail, these emerging memories address issues related to traditional memories’ manufacturing process, reliability, power consumption and performance. Despite these advantages, RRAMs are also susceptible to manufacturing deviations affecting their quality at time zero as well as to time-dependent deviations, which compromise their reliability during lifetime. In addition, the integration of these emerging memories with CMOS-based circuits pose significant design challenges. Thus, a holistic approach able to properly address these challenges, from design to obsolescence, is considered mandatory. Thus, this lecture starts providing an introduction related to the test and reliability theory, including essential basic definitions. Afterwards, the goal of this lecture is to present the idea behind the lifecycle management approach assuming CMOS- and emerging technology-based architectures. The different lifecycle phases and related challenges are going to be summarized. In addition, a discussion about the main sources of quality and reliability issues according to the lifecycle phases as well as the possible solutions able to address these main issues will be presented. Finally, this lecture will allow attendees to understand the lifecycle management choices available to ensure high-quality and -reliable state-of-the-art architectures based on CMOS and emerging technologies.
2. Radiation & Electromagnetic Interference on Modern ICs
Fabian Vargas, IHP - Leibniz Institute for High Performance Microelectronics
Bio: Fabian Vargas obtained the Ph.D. Degree in Microelectronics from the Institut National Polytechnique de Grenoble (INPG), France, in 1995. At present, he is Senior Scientist at IHP - Leibniz Institute for High Performance Microelectronics, Germany, where he works on the design of on-chip sensors and cross-layer resilience for aerospace systems. Vargas has served as Technical Committee Member and Guest-Editor in many IEEE-sponsored conferences and journals. He holds several patents and published over 200 refereed papers. Vargas was researcher of the BR National Science Foundation from 1996 to 2023. He co-founded the IEEE-Computer Society Latin American Test Technology Technical Council (IEEE LA-TTTC) in 1997 and the IEEE Latin American Test Symposium (LATS) in 2000. He received the Meritorious Service Award of the IEEE Computer Society for providing significant services as chair of these groups. Vargas is Golden Core Member of the IEEE Computer Society and Senior Member of the IEEE.
Abstract: Simulation and laboratory measurements can never tell the whole story of how devices will behave in real-world use. In real world, various interferences can occur simultaneously, where the IC can be exposed, for instance, to extreme environmental temperatures, battery wear-out/instability, electromagnetic interference (EMI), ionizing radiation (TID, SEEs) and aging (BTI, HCI, TDDB, electromigration). Moreover, there are many standards used to certify electronic circuits & systems, but they are applied independently (on fresh devices), not considering the combined effects one phenomenon may take over the other. Simulation and laboratory measurements can never tell the whole story of how devices will behave in real-world use. In real world, various interferences can occur simultaneously, where the IC can be exposed, for instance, to extreme environmental temperatures, battery wear-out/instability, electromagnetic interference (EMI), ionizing radiation (TID, SEEs) and aging (BTI, HCI, TDDB, electromigration). Moreover, there are many standards used to certify electronic circuits & systems, but they are applied independently (on fresh devices), not considering the combined effects one phenomenon may take over the other.
In this always-challenging context, this lecture addresses the fundamentals of ionizing radiation and EMI, the mechanisms by which they affect ICs, the current standards and laboratory test setup for electromagnetic immunity, total-ionizing dose (TID) and single-event effects (SEEs) on ICs.
The combined effects of ionizing radiation and electromagnetic interference on the reliability of modern ICs are discussed in detail. Conventional design solutions to counteract with these threats are presented. In the sequence, more elaborated solutions to trade in-field power-performance-lifespan-reliability based on the development of on-chip cross-layer sensors are described. Such sensors monitor from silicon parameters (aging, SEEs, power supply activity and temperature) to system-level ones (real-time operating system activity and task scheduling process). AI-based strategies to counteract with noise on power-supply lines and predict soft-error rate are introduced. These solutions deal with designing ARINC 653-compliant systems as well as enable mission-mode monitoring of system operation, which is a critical aspect of silicon lifecycle management (SLM) framework.
3. Reliability in Edge AI Systems
Matteo Sonza Reorda, Politecnico di Torino
Bio: Matteo SONZA REORDA took the MS degree in Electronics in 1986 and the PhD degree in Computer Engineering in 1990, both from Politecnico di Torino (Italy). Since 1990 he is with the Dept. of Control and Computer Engineering of the Politecnico di Torino, where he currently is a Full Professor and leads a research group working on test and fault tolerant design of ICs and systems. He published more than 400 papers on these topics, and is involved in several research projects with companies and public bodies. He is a Fellow of IEEE.
Abstact: In the last years Artificial Intelligence experienced a wide and rapid adoption in many application domains. The traditional architecture of AI-based systems, where the processing was performed in the cloud, was complemented with an alternative one, where some processing is done close to the sensors/actuators (Edge AI). Edge AI is made possible by the availability of powerful and still affordable devices, able to perform the AI processing within an embedded system. The Edge AI paradigm may provide several advantages wrt the traditional Cloud AI, e.g., in terms of speed, power, security and bandwidth. On the other side, it may rise issues in terms of reliability, e.g., because processing is performed in harsher environments by less expensive (and reliable) devices. The course will first discuss the main solutions which have been proposed/adopted to estimate the reliability of Edge AI systems and then will summarize the state of the art of the techniques to harden them when required, focusing on the effects of transient and permanent faults affecting the hardware in charge of the computation. An overview about the current and upcoming standards in the field will also be provided. We will conclude discussing the main open issues faced by researchers and professionals in the area.
Syllabus:
1. Edge AI: introduction and motivations for considering reliability issues
2. Reliability evaluation: challenges and solutions
3. Reliability enhancement
a. HW solutions (at device and system level)
b. SW solutions (acting on the libraries, the models, the training)
4. Open issues and Conclusions
Prerequisites (basic skills):
- Digital design
- Computer architectures
- AI
- Test and reliability
4. Automotive Functional Safety
Paolo Bernardi, Politecnico di Torino
Bio: Paolo Bernardi (MS'02 and PhD'06 in Computer Science) is an Associate Professor of the Politecnico di Torino University, working in the Electronic CAD and Reliability research group. His current interests include System-on-Chip test and reliability, especially in the direction of high-quality automotive devices. Prof. Bernardi has been the General Chair of the European Test Symposium 2023 (ETS23) and the Program Chair of the International Test Conference 2025 (ITC25). He is an IEEE senior member.Paolo Bernardi (MS'02 and PhD'06 in Computer Science) is an Associate Professor of the Politecnico di Torino University, working in the Electronic CAD and Reliability research group. His current interests include System-on-Chip test and reliability, especially in the direction of high-quality automotive devices. Prof. Bernardi has been the General Chair of the European Test Symposium 2023 (ETS23) and the Program Chair of the International Test Conference 2025 (ITC25). He is an IEEE senior member.
Abstract: The lecture covers several topics related to the Safety of automotive devices. The first part discusses the flow and steps necessary to reach a sufficient chip quality during the manufacturing test. Then, the focus moves to in-field techniques from the lower level of design for online testability, up to Silicon Lifecycle Management topics. The talk also references the most used standards in the automotive functional safety domain.The lecture covers several topics related to the Safety of automotive devices. The first part discusses the flow and steps necessary to reach a sufficient chip quality during the manufacturing test. Then, the focus moves to in-field techniques from the lower level of design for online testability, up to Silicon Lifecycle Management topics. The talk also references the most used standards in the automotive functional safety domain.
Syllabus:
- Automotive Mega-Trends
- Manufacturing test of automotive chips
- In-field reliability
- Silicon Lifecycle Management
- Automotive Functional Safety Standards
5. Enhancing the Reliability of Deep Learning Models for Safety-Critical Applications
Bio: Masoud Daneshtalab is a full professor at Mälardalen University in Sweden and adjunct professor at TalTech in Estonia. He is the scientific leader of AI@MDU, director of the deep learning and heterogenous system (DeepHERO) lab with over 20 PhD students, and co-leader of the Heterogeneous System research group. He is on the Euromicro board of directors, and an editor of the MICPRO journal. His research interests encompass algorithm-hardware co-design, embedded-friendly and reliable AI, and interconnection networks. Masoud has authored over 220 journal and conference papers and has developed open-source tools that enhance AI reliability and performance, especially in safety-critical systems. He has led multiple academic and industrial research projects with a total estimation of 20 MEuro.
Abstract: This lecture explores some strategies to improve the resilience and reliability of deep learning models for safety-critical applications. One approach enhances fault tolerance and reduces memory overhead by combining neuron-wise and layer-wise clipped activation functions, resulting in significantly improved resilience under high bit error rates. Another strategy focuses on progressive adversarial robust distillation, where robustness is transferred from a large teacher model to various smaller student networks, optimizing accuracy, robustness, and efficiency for resource-constrained devices. Finally, a scalable, semi-analytical fault resilience analysis method will be introduced, offering faster simulation times and enabling real-time reliability assessments for large models.
6. Radiation Effects on Electronics and Mitigation Strategies: HARV, a RISC-V-Based Approach to Radiation-Hardened Computing
Luigi Dilillo, University of Montpellier
Bio: Luigi Dilillo received his master's degree in Electronic Engineering from the Politecnico di Torino, Italy, in 2001, with specialization in Microelectronics. He received his Ph.D. degree in 2005 at LIRMM laboratory (University of Montpellier). From 2005 to 2007, he has been researcher at the University of Southampton, United Kingdom and next at CEA (French Alternative Energies and Atomic Energy Commission). From 2007 he is permanent researcher at CNRS (French National Centre for Scientific Research), and currently he is at IES (Institut for Electronics and System) as head of RADIAC team.
Abstract: Radiation effects on electronic devices present critical challenges, particularly in environments with high radiation exposure, such as space, nuclear facilities, and high-energy physics experiments. This lecture explores the origins of radiation, the characteristics of radiative environments, and their impact on electronic components. The primary degradation mechanisms—Total Ionizing Dose (TID), Displacement Damage (DD), and Single Event Effects (SEE)—are analyzed in detail, with a particular emphasis on SEE. Among these, Single Event Upsets (SEUs) in sequential circuits such as memory, as well as Single Event Transients (SETs), are examined due to their significant implications on system reliability. To address these issues, various error mitigation strategies are discussed, focusing on fault-tolerant structures that enhance system resilience against radiation-induced failures. The role of irradiation facilities in testing and validating these mitigation techniques is also highlighted, emphasizing their importance in ensuring the robustness of electronic systems for critical applications. As a practical case study, I presents HARV, a radiation-hardened RISC-V processor developed by IES-RADIAC. The design choices, implementation strategies, and performance evaluations of HARV are explored to illustrate how advanced radiation-hardening techniques can be effectively integrated into modern computing architectures. This study provides valuable insights into the development of reliable electronics for extreme environments, offering guidance for future research and innovation in radiation-hardened system design.