Abstract
Coarse-Grained Reconfigurable Arrays (CGRAs) promise higher computing power and better energy efficiency than field programmable gate arrays (FPGAs). Thus, they are attractive not only for embedded applications, but also for high-performance computing (HPC). Yet, in such applications floating point (FP) operations are the main workload. Most of the previous research on CGRAs considered only operations on integral data types, which can be executed in one clock cycle. In contrast, FP operations take multiple clock cycles and different operations have different latencies. In this contribution, we present a new mechanism that resolves data and structural hazards in processing elements (PEs) that feature in-order issue, but out-of-order completion of operations. We show that our mechanism is more area efficient than scoreboarding in most of the relevant cases. In addition, our mechanism is universal, i.e. not only restricted to PEs in CGRAs, but also applicable to microprocessors.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Gatzka, S., Hochberger, C.: The AMIDAR class of reconfigurable processors. J. Supercomput. 32(2), 163–181 (2005)
Goldstein, S., et al.: PipeRench: a coprocessor for streaming multimedia acceleration. In: Proceedings of the 26th Interntional Symposium on Computer Architecture (Cat. No. 99CB36367), pp. 28–39 (1999)
Govindaraju, V., et al.: DySER: unifying functionality and parallelism specialization for energy-efficient computing. IEEE Micro 32(5), 38–51 (2012)
Grossman, J.: Cheap out-of-order execution using delayed issue. In: Proceedings 2000 International Conference on Computer Design(ICCD), p. 549, September 2000
Käsgen, P.S., Weinhardt, M., Hochberger, C.: A coarse-grained reconfigurable array for high-performance computing applications. In: 2018 International Conference on ReConFigurable Computing and FPGAs (ReConFig), pp. 1–4. IEEE (2018)
Lu, G., Singh, H., Lee, M.H., Bagherzadeh, N., Kurdahi, F.J., Filho, E.M.C.: The MorphoSys parallel reconfigurable system. In: European Conference on Parallel Processing, pp. 727–734 (1999). citeseer.ist.psu.edu/461299.html
Mei, B., Vernalde, S., Verkest, D., De Man, H., Lauwereins, R.: ADRES: an architecture with tightly coupled VLIW processor and coarse-grained reconfigurable matrix. In: Y. K. Cheung, P., Constantinides, G.A. (eds.) FPL 2003. LNCS, vol. 2778, pp. 61–70. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-540-45234-8_7
Swanson, S., et al.: The wavescalar architecture. ACM Trans. Comput. Syst. 25(2), 4:1–4:54 (2007)
Thornton, J.: Parallel operation in the control data 6600. In: AFIPS 64 (Fall, Part II): Proceedings of the Fall Joint Computer Conference, Part II: Very High Speed Computer Systems, 27–29 October 1964 (1964)
Tomasulo, R.M.: An efficient algorithm for exploiting multiple arithmetic units. IBM J. Res. Dev. 11(1), 25–33 (1967)
Acknowledgment
This project is funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) - 283321772.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Käsgen, P.S., Weinhardt, M., Hochberger, C. (2019). Dynamic Scheduling of Pipelined Functional Units in Coarse-Grained Reconfigurable Array Elements. In: Schoeberl, M., Hochberger, C., Uhrig, S., Brehm, J., Pionteck, T. (eds) Architecture of Computing Systems – ARCS 2019. ARCS 2019. Lecture Notes in Computer Science(), vol 11479. Springer, Cham. https://doi.org/10.1007/978-3-030-18656-2_12
Download citation
DOI: https://doi.org/10.1007/978-3-030-18656-2_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-18655-5
Online ISBN: 978-3-030-18656-2
eBook Packages: Computer ScienceComputer Science (R0)