Skip to main content

Stream Processors

  • Chapter
  • First Online:
Multicore Processors and Systems

Part of the book series: Integrated Circuits and Systems ((ICIR))

  • 1798 Accesses

Abstract

Stream processors, like other multi core architectures partition their functional units and storage into multiple processing elements. In contrast to typical architectures, which contain symmetric general-purpose cores and a cache hierarchy, stream processors have a significantly leaner design. Stream processors are specifically designed for the stream execution model, in which applications have large amounts of explicit parallel computation, structured and predictable control, and memory accesses that can be performed at a coarse granularity. Applications in the streaming model are expressed in a gather–compute–scatter form, yielding programs with explicit control over transferring data to and from on-chip memory. Relying on these characteristics, which are common to many media processing and scientific computing applications, stream architectures redefine the boundary between software and hardware responsibilities with software bearing much of the complexity required to manage concurrency, locality, and latency tolerance. Thus, stream processors have minimal control consisting of fetching medium- and coarse-grained instructions and executing them directly on the many ALUs. Moreover, the on-chip storage hierarchy of stream processors is under explicit software control, as is all communication, eliminating the need for complex reactive hardware mechanisms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. S. Agarwala, A. Rajagopal, A. Hill, M. Joshi, S. Mullinnix, T. Anderson, R. Damodaran, L. Nardini, P. Wiley, P. Groves, J. Apostol, M. Gill, J. Flores, A. Chachad, A. Hales, K. Chirca, K. Panda, R. Venkatasubramanian, P. Eyres, R. Veiamuri, A. Rajaram, M. Krishnan, J. Nelson, J. Frade, M. Rahman, N. Mahmood, U. Narasimha, S. Sinha, S. Krishnan, W. Webster, Due Bui, S. Moharii, N. Common, R. Nair, R. Ramanujam, and M. Ryan. A 65 nm c64x+ multi-core dsp platform for communications infrastructure. Solid-State Circuits Conference, 2007. ISSCC 2007. Digest of Technical Papers. IEEE International, pages 262–601, 11–15 Feb 2007.

    Google Scholar 

  2. J. H. Ahn. Memory and Control Organizations of Stream Processors. PhD thesis, Stanford University, 2007.

    Google Scholar 

  3. J. H. Ahn, W. J. Dally, and M. Erez. Tradeoff between data-, instruction-, and Thread-level parallelism in stream processors. In proceedings of the 21st ACM International Conference on Supercomputing (ICS’07), June 2007.

    Google Scholar 

  4. J. H. Ahn, W. J. Dally, B. Khailany, U. J. Kapasi, and A. Das. Evaluating the imagine stream architecture. In ISCA ’04: Proceedings of the 31st Annual International Symposium on Computer Architecture, page 14, Washington, DC, USA, 2004. IEEE Computer Society.

    Google Scholar 

  5. J. H. Ahn, M. Erez, and W. J. Dally. Scatter-add in data parallel architectures. In Proceedings of the Symposium on High Performance Computer Architecture, Feb. 2005.

    Google Scholar 

  6. J. H. Ahn, M. Erez, and W. J. Dally. The design space of data-parallel memory systems. In SC’06, Nov. 2006.

    Google Scholar 

  7. AMD. AMD ATI Radeon™ HD 2900 Graphics Technology. http://ati.amd.com/products/Radeonhd2900/specs.html

  8. AMD. Product brief: Quad-core AMD opteron™ procsesor. http: http://www.amd.com/us-en/Processors/ProductInformation/0,,30_118_8796_152%23,00.html

  9. AMD. AMD stream computing SDK, 2008. http://ati.amd.com/technology/streamcomputing/sdkdwnld.html

  10. S. S. Bhattacharyya, P. K. Murthy, and E. A. Lee. Software Synthesis from Dataflow Graphs. Kluwer Academic Press, Norwell, MA, 1996.

    MATH  Google Scholar 

  11. I. Buck. Brook specification v0.2. Oct. 2003.

    Google Scholar 

  12. I. Buck. Stream Computing on Graphics Hardware. PhD thesis, Stanford University, Stanford, CA, USA, 2005. Adviser-Pat Hanrahan.

    Google Scholar 

  13. I. Buck, T. Foley, D. Horn, J. Sugerman, K. Fatahalian, M. Houston, and P. Hanrahan. Brook for GPUs: stream computing on graphics hardware. ACM Transactions on Graphics, 23(3):777–786, 2004.

    Article  Google Scholar 

  14. J. Buck, S. Ha, E. A. Lee, and D. G. Messerschmitt. Ptolemy: a framework for simulating and prototyping heterogeneous systems. Readings in Hardware/Software co-design, pages 527–543, 2002.

    Google Scholar 

  15. J. B. Carter, W. C. Hsieh, L. B. Stoller, M. R. Swanson, L. Zhang, and S. A. McKee. Impulse: Memory system support for scientific applications. Journal of Scientific Programming, 7: 195–209, 1999.

    Google Scholar 

  16. C. H. Crawford, P. Henning, M. Kistler, and C. Wright. Accelerating computing with the cell broadband engine processor. In CF ’08: Proceedings of the 2008 Conference on Computing Frontiers, pages 3–12. ACM, 2008.

    Google Scholar 

  17. W. J. Dally, P. Hanrahan, M. Erez, T. J. Knight, F. Labonté, J-H Ahn., N. Jayasena, U. J. Kapasi, A. Das, J. Gummaraju, and I. Buck. Merrimac: Supercomputing with streams. In SC’03, Phoenix, Arizona, Nov 2003.

    Google Scholar 

  18. W. J. Dally and W. Poulton. Digital Systems Engineering. Cambridge University Press, 1998.

    Google Scholar 

  19. A. Das, W. J. Dally, and P. Mattson. Compiling for stream processing. In PACT ’06: Proceedings of the 15th International Conference on Parallel Architectures and Compilation Techniques, pages 33–42, 2006.

    Google Scholar 

  20. ELPIDA Memory Inc. 512 M bits XDR™ DRAM, 2005. http://www.elpida.com/pdfs/E0643E20.pdf

  21. M. Erez. Merrimac – High-Performance and Highly-Efficient Scientific Computing with Streams. PhD thesis, Stanford University, Jan 2007.

    Google Scholar 

  22. M. Erez, J. H. Ahn, A. Garg, W. J. Dally, and E. Darve. Analysis and performance results of a molecular modeling application on Merrimac. In SC’04, Pittsburgh, Pennsylvaniva, Nov 2004.

    Google Scholar 

  23. M. Erez, J. H. Ahn, J. Gummaraju, M. Rosenblum, and W. J. Dally. Executing irregular scientific applications on stream architectures. In Proceedings of the 21st ACM International Conference on Supercomputing (ICS’07), June 2007.

    Google Scholar 

  24. M. Erez, N. Jayasena, T. J. Knight, and W. J. Dally. Fault tolerance techniques for the Merrimac streaming supercomputer. In SC’05, Seattle, Washington, USA, Nov 2005.

    Google Scholar 

  25. K. Fatahalian, T. J. Knight, M. Houston, M. Erezand, D. R. Horn, L. Leem, J. Y. Park, M. Ren, A. Aiken, W. J. Dally, and P. Hanrahan. Sequoia: Programming the memory hierarchy. In SC’06, Nov 2006.

    Google Scholar 

  26. J. Gummaraju, J. Coburn, Y. Turner, and M. Rosenblum. Streamware: programming general-purpose multicore processors using streams. SIGARCH Computer Architecture News, 36(1):297–307, 2008.

    Article  Google Scholar 

  27. J. Gummaraju, M. Erez, J. Coburn, M. Rosenblum, and W. J. Dally. Architectural support for the stream execution model on general-purpose processors. In PACT ’07: Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques, pages 3–12. IEEE Computer Society, 2007.

    Google Scholar 

  28. J. Gummaraju and M. Rosenblum. Stream programming on general-purpose processors. pages 343–354, 2005.

    Google Scholar 

  29. R. Ho, K. W. Mai, and M. A. Horowitz. The future of wires. Proceedings of the IEEE, 89(4):14–25, Apr 2001.

    Article  Google Scholar 

  30. H. P. Hofstee. Power efficient processor architecture and the cell processor. In Proceedings of the 11th International Symposium on High Performance Computer Architecture, Feb 2005.

    Google Scholar 

  31. Intel® Corp. Pemtium®M processor datasheet. http://download.intel.com/design/mobile/datashts/25261203.pdf, April 2004.

  32. T. Kanade, A. Yoshida, K. Oda, H. Kano, and M. Tanaka. A stereo machine for video-rate dense depth mapping and its new applications. Proceedings CVPR, 96:196–202, 1996.

    Google Scholar 

  33. U. J. Kapasi, W. J. Dally, S. Rixner, P. R. Mattson, J. D. Owens, and B. Khailany. Efficient conditional operations for data-parallel architectures. In Proceedings of the 33rd Annual IEEE/ACM International Symposium on Microarchitecture, pages 159–170, Dec 2000.

    Google Scholar 

  34. U. J. Kapasi, S. Rixner, W. J. Dally, B. Khailany, J. H. Ahn, P. Mattson, and J. D. Owens. Programmable stream processors. IEEE Computer, Aug 2003.

    Google Scholar 

  35. B. K. Khailany, T. Williams, J. Lin, E.P. Long, M. Rygh, D.W. Tovey, and W.J. Dally. A Programmable 512 GOPS stream processor for signal, image, and video processing. Solid-State Circuits, IEEE Journal, 43(1):202–213, 2008.

    Article  Google Scholar 

  36. B. Khailany. The VLSI Implementation and Evaluation of Area- and Energy-Efficient Streaming Media Processors. PhD thesis, Stanford University, June 2003.

    Google Scholar 

  37. B. Khailany, W. J. Dally, A. Chang, U. J. Kapasi, J. Namkoong, and B. Towles. VLSI design and verification of the Imagine processor. In Proceedings of the IEEE International Conference on Computer Design, pages 289–294, Sep 2002.

    Google Scholar 

  38. B. Khailany, W. J. Dally, S. Rixner, U. J. Kapasi, P. Mattson, J. Namkoong, J. D. Owens, B. Towles, and A. Chang. Imagine: Media processing with streams. IEEE Micro, pages 35–46, Mar/Apr 2001.

    Google Scholar 

  39. B. Khailany, W. J. Dally, S. Rixner, U. J. Kapasi, J. D. Owen, and B. Towles. Exploring the VLSI scalability of stream processors. In Proceedings of the Ninth Symposium on High Performance Computer Architecture, pages 153–164, Anaheim, CA, USA, Feb 2003.

    Google Scholar 

  40. R. Kleihorst, A. Abbo, B. Schueler, and A. Danilin. Camera mote with a high-performance parallel processor for real-time frame-based video processing. Distributed Smart Cameras, 2007. ICDSC ’07. First ACM/IEEE International Conference, pages 109–116, 25–28 Sept 2007.

    Google Scholar 

  41. E. A. Lee and D. G. Messerschmitt. Static scheduling of synchronous data flow programs for digital signal processing. IEEE Transactions on Computers, Jan 1987.

    Google Scholar 

  42. A. A. Liddicoat and M. J. Flynn. High-performance floating point divide. In Proceedings of the Euromicro Symposium on Digital System Design, pages 354–361, Sept 2001.

    Google Scholar 

  43. P. Mattson. A Programming System for the Imagine Media Processor. PhD thesis, Stanford University, 2002.

    Google Scholar 

  44. P. Mattson, W. J. Dally, S. Rixner, U. J. Kapasi, and J. D. Owens. Communication scheduling. In Proceedings of the Ninth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 82–92, 2000.

    Google Scholar 

  45. MIPS Technologies. MIPS64 20Kc Core, 2004. http://www.mips.com/ProductCatalog/P_MIPS6420KcCore

  46. NVIDIA®. NVIDIA’s Unified Architecture GeForce® 8 Series GPUs. http://www.nvidia.com/page/geforce8.html

  47. J. D. Owens, W. J. Dally, U. J. Kapasi, S. Rixner, P. Mattson, and B. Mowery. Polygon rendering on a stream architecture. In HWWS ’00: Proceedings of the ACM SIGGRAPH/EUROGRAPHICS Workshop on Graphics hardware, pages 23–32, 2000.

    Google Scholar 

  48. J. D. Owens, B. Khailany, B. Towles, and W. J. Dally. Comparing reyes and OpenGL on a stream architecture. In HWWS ’02: Proceedings of the ACM SIGGRAPH/EUROGRAPHICS Conference on Graphics Hardware, pages 47–56, 2002.

    Google Scholar 

  49. S. Rixner, W. J. Dally, U. J. Kapasi, B. Khailany, A. Lopez-Lagunas, P. R. Mattson, and J. D. Owens. A bandwidth-efficient architecture for media processing. In Proceedings of the 31st Annual IEEE/ACM International Symposium on Microarchitecture, Dallas, TX, November 1998.

    Google Scholar 

  50. S. Rixner, W. J. Dally, U. J. Kapasi, P. Mattson, and J. D. Owens. Memory access scheduling. In Proceedings of the 27th Annual International Symposium on Computer Architecture, June 2000.

    Google Scholar 

  51. S. Rixner, W. J. Dally, B. Khailany, P. Mattson, U. J. Kapasi, and J. D. Owens. Register organization for media processing. In Proceedings of the 6th International Symposium on High Performance Computer Architecture, Toulouse, France, Jan 2000.

    Google Scholar 

  52. Semiconductor Industry Association. The International Technology Roadmap for Semiconductors, 2005 Edition.

    Google Scholar 

  53. Texas Instruments. TMS320C6713 floating-point digital signal processor, datasheet SPRS186D, dec. 2001. http://focus.ti.com/lit/ds/symlink/tms320c6713.pdf, May 2003.

  54. W. Thies, M. Karczmarek, and S. P. Amarasinghe. StreamIt: a language for streaming applications. In Proceedings of the 11th International Conference on Compiler Construction, pages 179–196, Apr 2002.

    Google Scholar 

  55. D. van der Spoel, A. R. van Buuren, E. Apol, P. J. Meulen -hoff, D. Peter Tieleman, A. L. T. M. Sij bers, B. Hess, K. Anton Feenstra, E. Lindahl, R. van Drunen, and H. J. C. Berendsen. Gromacs User Manual version 3.1. Nij enborgh 4, 9747 AG Groningen, The Netherlands. Internet: http://www.gromacs.org, 2001.

Download references

Acknowledgments

Acknowledgments We would like to thank Steve Keckler for his insightful comments as well as the contributions of Jung Ho Ahn, Nuwan Jayasena, and Brucek Khailany. In addition, we are grateful to the entire Imagine and Merrimac teams and the projects’ sponsors.

Imagine was supported by a Sony Stanford Graduate Fellowship, an Intel Foundation Fellowship, the Defense Advanced Research Projects Agency under ARPA order E254 and monitored by the Army Intelligence Center under contract DABT63-96-C0037 and by ARPA order L172 monitored by the Department of the Air Force under contract F29601-00-2-0085.

The Merrimac Project was supported by the Department of Energy ASCI Alliances Program, Contract LLNL-B523583, with Stanford University as well as the NVIDIA Graduate Fellowship program.

Portions of this chapter are reprinted with permission from the following sources:

  • U. J. Kapasi, S. Rixner, W. J. Dally, B. Khailany, J. H. Ahn, P. Mattson, and J. D. Owens, “Programmable Stream Processors,” IEEE Computer, August 2003 (©2003 IEEE).

  • J. H. Ahn, W. J. Dally, B. K. Khailany, U. J. Kapasi, and A. Das, “Evaluating the imaginestream architecture,” In Proceedings of the 31st Annual International Symposium on Computer Architecture (© 2004 IEEE).

  • Stream Processors Inc., “Stream Processing: Enabling the New Generation of Easyto Use, High-Performance DSPs,” White Paper (© 2007 Stream Processors Inc.).

  • B. K. Khailany, T. Williams, J. Lin, E. P. Long, M. Rygh, D. W. Tovey, and W. J. Dally, “A Programmable 512 GOPS Stream Processor for Signal, Image, and Video Processing,” Solid-State Circuits, IEEE Journal, 43(1):202–213, 2008 (© 2008 IEEE).

  • J. H. Ahn, M. Erez, and W. J. Dally, “Tradeoff between Data-, Instruction-, and Thread-level Parallelism in Stream Processors,” In Proceedings of the 21st ACM International Conference on Supercomputing (ICS’07), June 2007 (DOI10.1145/1274971.1274991). © 2007 ACM, Inc. Included here by permission.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mattan Erez .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag US

About this chapter

Cite this chapter

Erez, M., Dally, W.J. (2009). Stream Processors. In: Keckler, S., Olukotun, K., Hofstee, H. (eds) Multicore Processors and Systems. Integrated Circuits and Systems. Springer, Boston, MA. https://doi.org/10.1007/978-1-4419-0263-4_8

Download citation

  • DOI: https://doi.org/10.1007/978-1-4419-0263-4_8

  • Published:

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-1-4419-0262-7

  • Online ISBN: 978-1-4419-0263-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics