Skip to main content

Fault Tolerance in an Industrial Seismic Processing Application for Multicore Clusters

  • Conference paper
Recent Advances in the Message Passing Interface (EuroMPI 2011)

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 6960))

Included in the following conference series:

Abstract

Seismic processing applications are used to identify geological structures where reservoirs of oil and gas may be found. With oil companies seeking better precision over larger geographical regions, these applications require larger clusters to keep execution times reasonable. The combination of longer run times and clusters with greater numbers of components increases the probability of faults during the execution. To address this issue, this paper describes an application-level fault tolerance mechanism that considers node crashes and communication link failures. For this industrial application, experiments show that continued execution with the remaining resources is both feasible and efficient.

Supported by PRONEX E-26/110.552/2010, CNPq, FAPERJ, PETROBRAS.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Batchu, R., Dandass, Y., Skjellum, A., Beddhu, M.: MPI/FT: a model-based approach to low-overhead fault tolerant message-passing middleware. Cluster Computing 7(4), 303–315 (2004)

    Article  Google Scholar 

  2. Bouteiller, A., Herault, T., Krawezik, G., Lemarinier, P., Cappello, F.: MPICH-V project: A multiprotocol automatic fault-tolerant MPI. International Journal of High Performance Computing Applications 20(3), 319–333 (2006)

    Article  Google Scholar 

  3. Chandra, T.D., Toueg, S.: Unreliable failure detectors for reliable distributed systems. J. ACM 43(2), 225–267 (1996)

    Article  MathSciNet  MATH  Google Scholar 

  4. Deutsch, P., et al.: Zlib compressed data format specification version 3.3 (1996)

    Google Scholar 

  5. Duarte, A., Rexachs, D.I., Luque, E.: An intelligent management of fault tolerance in cluster using RADICMPI. In: Mohr, B., Träff, J.L., Worringen, J., Dongarra, J. (eds.) PVM/MPI 2006. LNCS, vol. 4192, pp. 150–157. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  6. Fagg, G.E., Dongarra, J.: FT-MPI: Fault Tolerant MPI, Supporting Dynamic Applications in a Dynamic World. In: Dongarra, J., Kacsuk, P., Podhorszki, N. (eds.) PVM/MPI 2000. LNCS, vol. 1908, pp. 346–353. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  7. Hoefler, T., Mehlan, T., Mietke, F., Rehm, W.: A Survey of Barrier Algorithms for Coarse Grained Supercomputers. Chemnitzer Informatik Berichte 04(03) (2004)

    Google Scholar 

  8. Intel: Intel MPI Library Reference Manual (2011), http://software.intel.com/en-us/articles/intel-mpi-library-documentation

  9. Larrea, M., Arévalo, S., Fernández, A.: Efficient algorithms to implement unreliable failure detectors in partially synchronous systems. Dist. Comp., 847–847 (1999)

    Google Scholar 

  10. Louca, S., Neophytou, N., Lachanas, A., Evripidou, P.: MPI-FT: Portable Fault Tolerance Scheme for MPI. Parallel Processing Letters 10(4), 371–382 (2000)

    Article  Google Scholar 

  11. Ortigosa, F., Araya-Polo, M., Rubio, F., Hanzich, M., Cruz, R., Cela, J.: Evaluation of 3d RTM on HPC platforms. SEG Expanded Abstracts 27(1), 2879–2883 (2008)

    Google Scholar 

  12. da Silva, J.A., Rebello, V.E.F.: Low Cost Self-healing in MPI Applications. In: Cappello, F., Herault, T., Dongarra, J. (eds.) PVM/MPI 2007. LNCS, vol. 4757, pp. 144–152. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Gonçalves, A., Bersot, M., Bulcão, A., Boeres, C., Drummond, L., Rebello, V. (2011). Fault Tolerance in an Industrial Seismic Processing Application for Multicore Clusters. In: Cotronis, Y., Danalis, A., Nikolopoulos, D.S., Dongarra, J. (eds) Recent Advances in the Message Passing Interface. EuroMPI 2011. Lecture Notes in Computer Science, vol 6960. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24449-0_30

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-24449-0_30

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-24448-3

  • Online ISBN: 978-3-642-24449-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics