Abstract
Commercial off-the-shelf microprocessors are the core of low-cost embedded systems due to their programmability and cost-effectiveness. Recent advances in electronic technologies have allowed remarkable improvements in their performance. However, they have also made microprocessors more susceptible to transient faults induced by radiation. These non-destructive events (soft errors), may cause a microprocessor to produce a wrong computation result or lose control of a system with catastrophic consequences. Therefore, soft error mitigation has become a compulsory requirement for an increasing number of applications, which operate from the space to the ground level. In this context, this paper uses the concept of selective hardening, which is aimed to design reduced-overhead and flexible mitigation techniques. Following this concept, a novel flexible version of the software-based fault recovery technique known as SWIFT-R is proposed. Our approach makes possible to select different registers subsets from the microprocessor register file to be protected on software. Thus, design space is enriched with a wide spectrum of new partially protected versions, which offer more flexibility to designers. This permits to find the best trade-offs between performance, code size, and fault coverage. Three case studies have been developed to show the applicability and flexibility of the proposal.
Similar content being viewed by others
References
Avirneni NDP, Somani AK (2012) Low overhead soft error mitigation techniques for high-performance and aggressive designs. IEEE Trans Comput 61(4):488–501
Avizienis A (1985) The N-version approach to fault-tolerant software. IEEE Trans Software Eng 11(12):1491–1501
Azambuja JR, Pagliarini S, Rosa L, Kastensmidt FL (2011) Exploring the limitations of software-based techniques in SEE fault coverage. J Electron Test 27:541–550
Azambuja JR, Lapolli A, Rosa L, Kastensmidt FL (2011) Detecting SEEs in microprocessors through a non-intrusive hybrid technique. IEEE Trans Nucl Sci 58(3):993–1000
Barth JL, Dyer CS, Stassinopoulos EG (2003) Space, atmospheric, and terrestrial radiation environments. IEEE Trans Nucl Sci 50(3, Part 3):466–482
Baumann RC (2005) Radiation-induced soft errors in advanced semiconductor technologies. IEEE Trans Device Mater Reliab 5(3):305–316
Benso A, Chiusano S, Prinetto P, Tagliaferri L (2000) A C/C++ source-to-source compiler for dependable applications. In: Proceedings of international conference on dependable systems and networks, DSN, pp 71–78
Bernardi P, Bolzani Poehls LM, Grosso M, Sonza Reorda M (2010) A hybrid approach for detection and correction of transient faults in SoCs. IEEE Trans Depend. Secur Comput 7(4):439–445
Bolchini C (2003) A software methodology for detecting hardware faults in VLIW data paths. IEEE Trans Reliab 52(4):458–468
Chielle E, Azambuja JR, Barth RS, Almeida F, Kastensmidt FL (2012) Evaluating selective redundancy in data-flow software-based techniques. In: Proceedings of the 13th European conf. on radiation and its effects on components and systems RADECS
Cong J, Gururaj K (2011) Assuring application-level correctness against soft errors. In: Proceedings of the IEEE/ACM international conference on computer-aided design (ICCAD), pp 150–157
Cuenca-Asensi S, Martínez-Álvarez A, Restrepo-Calle F, Palomo FR, Guzmán-Miranda H, Aguirre MA (2011) A novel co-design approach for soft errors mitigation in embedded systems. IEEE Trans Nucl Sci 58(3):1059–1065
Edwards R, Dyer C, Normand E (2004) Technical standard for atmospheric radiation single event effects (SEE) on avionics electronics. In: IEEE radiation effects data workshop (REDW), pp 1–5. IEEE
Goloubeva O, Rebaudengo M, Reorda MS, Violante M (2005) Improved software-based processor control-flow errors detection technique. In: Proceedings of the annual reliability and maintainability symposium, pp 583–589
Goloubeva O, Rebaudengo M, Sonza Reorda M, Violante M (2006) Software-implemented hardware fault tolerance, vol. XIV. Springer
Gomaa MA, Scarbrough C, Vjaykumar TN, Pomeranz I (2003) Transient-fault recovery for chip multiprocessors. IEEE Micro 23(6):76–83
Guzmán-Miranda H, Aguirre MA, Tombs J (2009) Noninvasive fault classification, robustness and recovery time measurement in microprocessor-type architectures subjected to radiation-induced errors. IEEE Trans Instrum Meas 58(5):1514–1524
Jie H, Li F, Degalahal V, Kandemir M, Vijaykrishnan N, Irwin MJ (2009) Compiler-assisted soft error detection under performance and energy constraints in embedded systems. ACM Trans Embed Comput Syst 8:27:1–27:30
Jochim M (2002) Detecting processor hardware faults by means of automatically generated virtual duplex systems. In: Proceedings of international conference on dependable systems and networks, DSN, pp 399–408
Karnik T, Hazucha P, Patel J (2004) Characterization of soft errors caused by single event upsets in CMOS processes. IEEE Trans Depend Secure Comput 1(2):128–143
Lee J, Shrivastava A (2009) Compiler-managed register file protection for energy-efficient soft error reduction. In: Proceedings of the ASP-DAC 2009: 14th Asia and South Pacific design automation conference, pp 618–623
Lee J, Shrivastava A (2010) A compiler-microarchitecture hybrid approach to soft error reduction for register files. IEEE Trans Comp-Aided Des Integ Cir Sys 29:1018–1027
Lee C, Potkonjak M, Mangione-Smith WH (1997) Mediabench: a tool for evaluating and synthesizing multimedia and communicatons systems. In: Proceedings of the 30th annual ACM/IEEE int. symp. microarchitecture. MICRO 30, pp 330–335
Leonardo MR, Sansoè C, Passerone C, Speretta S, Tranchero M, Borri M, Del Corso D (2010) Aerospace technologies advancementsfs. Chapter 9: design solutions for modular satellite architectures. Intech, Olajnica 19/2, 32000 Vukovar, Croatia
Lin S, Kim Y-B, Lombardi F (2011) A 11-transistor nanoscale CMOS memory cell for hardening to soft errors. IEEE Trans VLSI Syst 19(5):900–904
Martínez-Álvarez A, Cuenca-Asensi S, Restrepo-Calle F, Palomo FR, Guzmán-Miranda H, Aguirre MA (2012) Compiler-directed soft error mitigation for embedded systems. IEEE Trans Depend Secur Comput 9(2):159–172
McLoughlin IV, Bretschneider TR (March 2010) Reliability through redundant parallelism for micro-satellite computing. ACM Trans Embed Comput Syst 9(3):26:1–26:25
Mukherjee SS, Weaver C, Emer J, Reinhardt SK, Austin T (2003) A systematic methodology to compute the architectural vulnerability factors for a high-performance microprocessor. In: Proceedings of the 36th international symposium on microarchitecture, pp 29–40
Nicolaidis M (2005) Design for soft error mitigation. IEEE Trans Device Mat Reliab 5(3):405–418
Nicolaidis M (2011) Soft errors in modern electronic systems, volume 41 of Frontiers in electronic testing, 1 edn. Springer
Nicolescu B, Savaria Y, Velazco R (2004) Software detection mechanisms providing full coverage against single bit-flip faults. IEEE Trans Nucl Sci 51(6):3510–3518
Oh N, McCluskey EJ (2002) Error detection by selective procedure call duplication for low energy consumption. IEEE Trans Reliab 51(4):392–402
Oh N, Mitra S, McCluskey EJ (2002) ED4I: error detection by diverse data and duplicated instructions. IEEE Trans Comput 51(2):180–199
Oh N, Shirvani PP, McCluskey EJ (2002) Control-flow checking by software signatures. IEEE Trans Reliab 51(1):111–122
Oh N, Shirvani PP, McCluskey EJ (2002) Error detection by duplicated instructions in super-scalar processors. IEEE Trans Reliab 51(1):63–75
Pignol M (2005) How to cope with SEU/SET at system level? In: Proceedings of the 11th IEEE international on-line testing symp, IOLTS, pp. 315–318
Pignol M (2010) COTS-based applications in space avionics. In: Proceedings of the 13th design, automation and test in Europe conference, DATE, pp. 1213–1219. Dresden, Germany
Pratt B, Caffrey M, Carroll JF, Graham P, Morgan K, Wirthlin M (2008) Fine-grain SEU mitigation for FPGAs using partial TMR. IEEE Trans Nucl Sci 55(4):2274–2280
Ragel RG, Parameswaran S (2011) A hybrid hardware–software technique to improve reliability in embedded processors. ACM Trans Embed Comput Syst 10(3):36:1–36:16
Rebaudengo M, Reorda MS, Violante M, Torchiano M (2001) A source-to-source compiler for generating dependable software. In: Proceedings of the 1st IEEE international workshop on source code analysis and manipulation, pp 33–42
Rebaudengo M, Reorda MS, Violante M (2004) A new approach to software-implemented fault tolerance. J Electron Test 20(4):433–437
Rebaudengo M, Sonza-Reorda M, Violante M (2011) Soft errors in modern electronic systems, volume 41 of Frontiers in electronic testing, chapter 9, 1 edn. Software-level soft error mitigation techniques. Springer
Reddy VK, Parthasarathy S, Rotenberg E (2006) Understanding prediction-based partial redundant threading for low-overhead, high-coverage fault tolerance. ACM Sigplan Notices 41(11):83–94
Reinhardt SK, Mukherjee SS (2000) Transient fault detection via simultaneous multithreading. In: 27th international symposium on computer architecture, pp 25–36. Vancuver, Canada, Jun 12–14
Reis GA, Chang J, Vachharajani N, Rangan R, August DI (2005) SWIFT: software implemented fault tolerance. In: CGO 2005: international symposium on code generation and optimization, pp 243–254
Reis GA, Chang J, August DI (2007) Automatic instruction-level software-only recovery. IEEE Micro 27(1):36–47
Ruano O, Maestro JA, Reviriego P (2009) A methodology for automatic insertion of selective TMR in digital circuits affected by SEUs. IEEE Trans Nucl Sci 56(4):2091–2102
Samudrala PK, Ramos J, Katkoori S (2004) Selective triple modular redundancy (STMR) based single-event upset (SEU) tolerant synthesis for FPGAs. IEEE Trans Nucl Sci 51(5, Part 4):2957–2969
Sundaram A, Aakel A, Lockhart D, Thaker D, Franklin D (2008) Efficient fault tolerance in multi-media applications through selective instruction replication. In: Proceedings of the 2008 workshop on radiation effects and fault tolerance in nanometer technologies. WREFT ’08, pp 339–346
Vemu R, Abraham JA (2008) Budget-dependent control-flow error detection. In: 14th IEEE international on-line testing symposium IOLTS’08, pp 73–78
Venkatasubramanian R, Hayes JP, Murray BT (2003) Low-cost on-line fault detection using control flow assertions. In: 9th IEEE on-line testing symposium IOLTS, pp. 137–143
Vera X, Abella J, Carretero J, González A (2010) Selective replication: a lightweight technique for soft errors. ACM Trans Comput Syst 27(4):8:1–8:30
XILINX (2008) PicoBlaze 8-bit embedded microcontroller user guide. UG129 (v1.1.2). Xilinx Ltd
Guthaus MR, Ringenberg JS, Ernst D, Austin TM, Mudge T, Brown RB (2001) Mibench: a free, commercially representative embedded benchmark suite. In: Proceedings of the IEEE international workshop of the workload characterization. WWC-4, pp 3–14
Yeh TY, Reinman G, Patel SJ, Faloutsos P (2009) Fool me twice: exploring and exploiting error tolerance in physics-based animation. ACM Trans Graph 29(1):5:1–5:11
Author information
Authors and Affiliations
Corresponding author
Additional information
This work was funded by the Ministry of Science and Innovation in Spain with the project ‘RENASER+: Integral Analysis of Digital Circuits and Systems for Aerospace Applications’ (TEC2010-22095-C03-01).
Rights and permissions
About this article
Cite this article
Restrepo-Calle, F., Martínez-Álvarez, A., Cuenca-Asensi, S. et al. Selective SWIFT-R. J Electron Test 29, 825–838 (2013). https://doi.org/10.1007/s10836-013-5416-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10836-013-5416-6