Abstract
As emerging workloads exhibit irregular memory access patterns with poor data reuse and locality, they would benefit from a DRAM that achieves low latency without sacrificing bandwidth and energy efficiency. We propose LLM (Low Latency Memory), a codesign of the DRAM microarchitecture, the memory controller and the LLC/DRAM interconnect by leveraging embedded silicon photonics in 2.5D/3D integrated system on chip. LLM relies on Wavelength Division Multiplexing (WDM)-based photonic interconnects to reduce the contention throughout the memory subsystem. LLM also increases the bank-level parallelism, eliminates bus conflicts by using dedicated optical data paths, and reduces the access energy per bit with shorter global bitlines and smaller row buffers. We evaluate the design space of LLM for a variety of synthetic benchmarks and representative graph workloads on a full-system simulator (gem5). LLM exhibits low memory access latency for traffics with both regular and irregular access patterns. For irregular traffic, LLM achieves high bandwidth utilization (over 80% peak throughput compared to 20% of HBM2.0). For real workloads, LLM achieves 3\(\times \) and 1.8\(\times \) lower execution time compared to HBM2.0 and a state-of-the-art memory system with high memory level parallelism, respectively. This study also demonstrates that by reducing queuing on the data path, LLM can achieve on average 3.4\(\times \) lower memory latency variation compared to HBM2.0.
This work was supported in part by ARO award W911NF1910470.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Ayar Labs Realizes Co-Packaged Silicon Photonics - WikiChip Fuse. https://fuse.wikichip.org/news/3233/ayar-labs-realizes-co-packaged-silicon-photonics/
JEDEC. https://www.jedec.org/sites/default/files/docs/JESD212.pdf
Thermistor Specification Fiber Specification an exemplary Eye Diagram of one F-P mode Externally modulated at 2.5 GHz filtered-out single channel. www.innolume.com
Zen - Microarchitectures - AMD - WikiChip. https://en.wikichip.org/wiki/amd/microarchitectures/zen
Batten, C., et al.: Building many-core processor-to-dram networks with monolithic CMOS silicon photonics. In: International Symposium on Microarchitecture (MICRO), pp. 8–21 (2009)
Beamer, S., et al.: Re-architecting dram memory systems with monolithically integrated silicon photonics. In: Proceedings International Symposium on Computer Architecture (ISCA), pp. 129–140. IEEE (2010)
Beamer, S., et al.: The gap benchmark suite. arXiv preprint arXiv:1508.03619 (2015)
Carter, J., et al.: Impulse: building a smarter memory controller. In: Proceedings Fifth International Symposium on High-Performance Computer Architecture, pp. 70–79. IEEE (1999)
Chatterjee, N., et al.: Managing dram latency divergence in irregular GPGPU applications. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC), pp. 128–139 (2014)
Chatterjee, N., et al.: Architecting an energy-efficient dram system for GPUS. In: IEEE International Symposium on High Performance Computer Architecture (HPCA), pp. 73–84. IEEE (2017)
Cheung, S., et al.: Ultra-compact silicon photonic 512\(\times \) 512 25 GHZ arrayed waveguide grating router. IEEE J. Selected Top. Quant. Electron. 20, 310–316 (2013)
Cianchetti, M.J., et al.: Phastlane: a rapid transit optical routing network. In: Proceedings of the International Symposium on Computer Architecture (ISCA), pp. 441–450 (2009)
Cooper-Balis, E., et al.: Fine-grained activation for power reduction in dram. In: International Symposium on Microarchitecture (MICRO), pp. 34–47 (2010)
Eklov, D., et al.: Bandwidth bandit: quantitative characterization of memory contention. In: Proceedings of the 2013 IEEE/ACM CGO, pp. 1–10 (2013)
Fotouhi, P., et al.: Enabling scalable chiplet-based uniform memory architectures with silicon photonics. In: Proceedings of the International Symposium on Memory Systems (MEMSYS), pp. 222–334 (2019)
Grani, P., et al.: Design and evaluation of AWGR-based photonic NOC architectures for 2.5 d integrated high performance computing systems. In: IEEE International Symposium on High Performance Computer Architecture (HPCA), pp. 289–300. IEEE (2017)
Gupta, U., et al.: The architectural implications of facebook’s DNN-based personalized recommendation. In: IEEE International Symposium on High Performance Computer Architecture (HPCA), pp. 488–501. IEEE (2020)
Ha, H., et al.: Improving energy efficiency of dram by exploiting half page row access. In: International Symposium on Microarchitecture (MICRO), pp. 1–12. IEEE (2016)
Hassan, H., et al.: Chargecache: reducing dram latency by exploiting row access locality. In: IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE (2016)
JESD235A, J.: High Bandwidth Memory (HBM) Dram. JEDEC Solid State Technology Association (2015)
Kaseridis, D., et al.: Minimalist open-page: a dram page-mode scheduling policy for the many-core era. In: International Symposium on Microarchitecture (MICRO), pp. 24–35. IEEE (2011)
Kim, Y., et al.: A case for exploiting subarray-level parallelism (SALP) in dram. In: Proceedings of the International Symposium on Computer Architecture (ISCA), pp. 368–379. IEEE (2012)
Kirman, N., et al.: Leveraging optical technology in future bus-based chip multiprocessors. In: International Symposium on Microarchitecture (MICRO), pp. 492–503. IEEE (2006)
Li, H., et al.: A 25 Gb/s, 4.4 v-swing, ac-coupled ring modulator-based WDM transmitter with wavelength stabilization in 65 nm CMOS. IEEE J. Solid-State Circuits 50, 3145–3159 (2015)
Li, L., et al.: 3d sip with organic interposer for ASIC and memory integration. In: IEEE 66th Electronic Components and Technology Conference (ECTC), pp. 1445–1450. IEEE (2016)
Lowe-Power, et al.: The gem5 simulator: Version 20.0+. arXiv preprint arXiv:2007.03152 (2020)
Luszczek, P.R., et al.: The HPC challenge (HPCC) benchmark suite. In: Proceedings of the 2006 ACM/IEEE Conference on Supercomputing, p. 213-es (2006)
Matsuo, S.A.O.: Microring-resonator-based widely tunable lasers. IEEE J. Select. Top. Quant. Electron. 15, 545–554 (2009)
Nitta, C.J., et al.: On-chip photonic interconnects: a computer architect’s perspective. Synthesis Lectures on Computer Architecture, pp. 1–111 (2013)
O’Connor, M., et al.: Fine-grained dram: energy-efficient dram for extreme bandwidth systems. In: International Symposium on Microarchitecture (MICRO), pp. 41–54. IEEE (2017)
Papistas, I., et al.: Bandwidth-to-area comparison of through silicon VIAS and inductive links for 3-d ICS. In: European Conference on Circuit Theory and Design (ECCTD), pp. 1–4. IEEE (2015)
Parekh, M.S., et al.: Electrical, optical and fluidic through-silicon VIAS for silicon interposer applications. In: IEEE Electronic Components and Technology Conference (ECTC), pp. 1992–1998. IEEE (2011)
Proietti, R., et al.: Experimental demonstration of a 64-port wavelength routing thin-clos system for data center switching architectures. J. Opt. Commun. Network. 10, 49–B57 (2018)
Rumley, S., et al.: Silicon photonics for exascale systems. J. Lightwave Technol. 33, 547–562 (2015)
Shacham, A., et al.: Photonic networks-on-chip for future generations of chip multiprocessors. IEEE Trans. Comput. 57, 1246–1260 (2008)
Shang, K., et al.: Low-loss compact silicon nitride arrayed waveguide gratings for photonic integrated circuits. IEEE Photon. J. 9, 1–5 (2017)
Shen, Y., et al.: Silicon photonics for extreme scale systems. J. Lightwave Technol. 37, 245–259 (2019)
Takada, K., et al.: Low-crosstalk 10-GHZ-spaced 512-channel arrayed-waveguide grating multi/demultiplexer fabricated on a 4-in wafer. IEEE Photon. Technol. Lett. 13, 1182–1184 (2001)
Udipi, A.N., et al.: Rethinking dram design and organization for energy-constrained multi-cores. In: Proceedings of the International Symposium on Computer Architecture (ISCA), pp. 175–186 (2010)
de Valicourt, et al.: Dual hybrid silicon-photonic laser with fast wavelength tuning. In: Optical Fiber Communications Conference and Exhibition (OFC), pp. 1–3 (2016)
Wade, M., et al.: Teraphy: a chiplet technology for low-power, high-bandwidth in-package optical I/O. In: International Symposium on Microarchitecture (MICRO), pp. 63–71 (2020)
Wang, Y., et al.: Figaro: Improving system performance via fine-grained in-dram data relocation and caching. In: International Symposium on Microarchitecture (MICRO), pp. 313–328. IEEE (2020)
Werner, S., et al.: Amon: an advanced mesh-like optical NOC. In: IEEE 23rd Annual Symposium on High-Performance Interconnects, pp. 52–59 (2015)
Werner, S., et al.: AWGR-based optical processor-to-memory communication for low-latency, low-energy vault accesses. In: Proceedings of the International Symposium on Memory Systems (MEMSYS), pp. 269–278 (2018)
Werner, S., et al.: 3d photonics as enabling technology for deep 3d dram stacking. In: Proceedings of the International Symposium on Memory Systems (MEMSYS), pp. 206–221 (2019)
Yu, K., et al.: A 25 Gb/s hybrid-integrated silicon photonic source-synchronous receiver with microring wavelength stabilization. IEEE J. Solid-State Circuits 51, 2129–2141 (2016)
Zhang, T., et al.: Half-dram: a high-bandwidth and low-power dram architecture from the rethinking of fine-grained activation. In: Proceedings of the International Symposium on Computer Architecture (ISCA), pp. 349–360. IEEE (2014)
Zhang, Y., et al.: High-density wafer-scale 3-D silicon-photonic integrated circuits. IEEE J. Select. Top. Quant. Electron. 24, 1–10 (2018)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
Fariborz, M. et al. (2022). LLM: Realizing Low-Latency Memory by Exploiting Embedded Silicon Photonics for Irregular Workloads. In: Varbanescu, AL., Bhatele, A., Luszczek, P., Marc, B. (eds) High Performance Computing. ISC High Performance 2022. Lecture Notes in Computer Science, vol 13289. Springer, Cham. https://doi.org/10.1007/978-3-031-07312-0_3
Download citation
DOI: https://doi.org/10.1007/978-3-031-07312-0_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-07311-3
Online ISBN: 978-3-031-07312-0
eBook Packages: Computer ScienceComputer Science (R0)