Recommendations for using Simulated Annealing in task mapping

Orsila, Heikki; Salminen, Erno; Hämäläinen, Timo

doi:10.1007/s10617-013-9119-0

Recommendations for using Simulated Annealing in task mapping

Published: 12 September 2013

Volume 17, pages 53–85, (2013)
Cite this article

Design Automation for Embedded Systems Aims and scope Submit manuscript

Heikki Orsila¹,
Erno Salminen¹ &
Timo Hämäläinen¹

698 Accesses
10 Citations
Explore all metrics

Abstract

A Multiprocessor System-on-Chip (MPSoC) may contain hundreds of processing elements (PEs) and thousands of tasks but design productivity is lagging the evolution of HW platforms. One problem is application task mapping, which tries to find a placement of tasks onto PEs which optimizes several criteria such as application runtime, intertask communication, memory usage, energy consumption, real-time constraints, as well as area in case that PE selection or buffer sizing are combined with the mapping procedure. Among optimization algorithms for the task mapping, we focus in this paper on Simulated Annealing (SA) heuristics. We present a literature survey and 5 general recommendations for reporting heuristics that should allow disciplined comparisons and reproduction by other researchers. Most importantly, we present our findings about SA parameter selection and 7 guidelines for obtaining a good trade-off made between solution quality and algorithm’s execution time. Notably, SA is compared against global optimum. Thorough experiments were performed with 2–8 PEs, 11–32 tasks, 10 graphs per system, and 1000 independent runs, totaling over 500 CPU days of computation. Results show that SA offers 4–6 orders of magnitude reduction is optimization time compared to brute force while achieving high quality solutions. In fact, the globally optimum solution was achieved with a 1.6—90 % probability when problem size is around 1e9–4e9 possibilities. There is approx. 90 % probability for finding a solution that is at most 18 % worse than optimum.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A novel simulated annealing-based optimization approach for cluster-based task scheduling

Article 27 May 2021

A Heterogeneous Multi-core Network-on-Chip Mapping Optimization Algorithm

Domain-Knowledge Optimized Simulated Annealing for Network-on-Chip Application Mapping

References

Ali S, Kim J-K, Siegel HJ, Maciejewski AA (2008) Static heuristics for robust resource allocation of continuously executing applications. J Parallel Distrib Comput 68(8):1070–1080. ISSN 0743-7315. doi:10.1016/j.jpdc.2007.12.007
Article MATH Google Scholar
Bailey DH (1991) Twelve ways to fool the masses when giving performance results on parallel computers. Supercomput Rev 4(8):54–55. http://crd.lbl.gov/~dhbailey/dhbpapers/twelve-ways.pdf
Google Scholar
Barr RS, Golden BL, Kelly JP, Resende MGC, Stewart WR (1995) Designing and reporting on computational experiments with heuristic methods. Springer J Heuristics 1(1):9–32
Article MATH Google Scholar
Bollinger SW, Midkiff SF (1991) Heuristic technique for processor and link assignment in multicomputers. IEEE Trans Comput 40:325–333
Article Google Scholar
Braun TD, Siegel HJ, Beck N (2001) A comparison of eleven static heuristics for mapping a class if independent tasks onto heterogeneous distributed systems. IEEE J Parallel Distrib Comput 61:810–837
Article Google Scholar
Coroyer C, Liu Z (1991) Effectiveness of heuristics and simulated annealing for the scheduling of concurrent tasks an empirical comparison. Rapport de recherche de l’INRIA Sophia Antipolis 1379
DCS task mapper (2010) A task mapping and scheduling tool for multiprocessor systems. http://wiki.tut.fi/DACI/DCSTaskMapper
This paper’s experiment data files (2012). http://zakalwe.fi/~shd/task-mapping/experiment-data-2012-11.tar.gz
Dorigo M, Stützle T (2004) Ant colony optimization. MIT Press, Cambridge. ISBN 0-262-04219-3
Book MATH Google Scholar
Ercal F, Ramanujam J, Sadayappan P (1988) Task allocation onto a hypercube by recursive mincut bipartitioning. ACM, New York, pp 210–221. http://dl.acm.org/citation.cfm?id=62323
Google Scholar
Ferrandi F, Pilato C, Sciuto D, Tumeo A (2010) Mapping and scheduling of parallel C applications with ant colony optimization onto heterogeneous reconfigurable MPSoCs. In: Design automation conference (ASP-DAC), 15th, 2010, Asia and South Pacific, pp 799–804
Google Scholar
Girkar M, Polychronopoulos CD (1992) Automatic extraction of functional parallelism from ordinary programs. IEEE Trans Parallel Distrib Syst 3(2):166–178
Article Google Scholar
Gries M (2004) Methods for evaluating and covering the design space during early design development. Integr VLSI J 38(2):131–183
Article Google Scholar
Jobqueue (2010) A tool for parallelizing jobs to a cluster of computers. http://zakalwe.fi/~shd/foss/jobqueue/
Kahn G (1974) The semantics of a simple language for parallel programming. In: Proceedings of IFIP Congress 74, information processing 74, pp 471–475. http://www1.cs.columbia.edu/~sedwards/papers/kahn1974semantics.pdf
Google Scholar
Kim J-K, Shivle S, Siegel HJ, Maciejewski AA, Braun TD, Schneider M, Tideman S, Chitta R, Dilmaghani RB, Joshi R, Kaul A, Sharma A, Sripada S, Vangari P, Yellampalli SS (2007) Dynamically mapping tasks with priorities and multiple deadlines in a heterogeneous environment. J Parallel Distrib Comput Elsevier 67:154–169
Article MATH Google Scholar
Kirkpatrick S, Gelatt CD, Vecchi MP (1983) Optimization by simulated annealing. Science 200(4598):671–680
Article MathSciNet Google Scholar
Koch P (1995) Strategies for realistic and efficient static scheduling of data independent algorithms onto multiple digital signal processors. Doctoral thesis, The DSP Research Group, Institute for Electronic Systems, Aalborg University, Aalborg, Denmark
kpn-generator (2009) A program for generating random Kahn process network graphs. http://zakalwe.fi/~shd/foss/kpn-generator/
Kwok Y-K, Ahmad I, Gu J (1996) FAST: a low-complexity algorithm for efficient scheduling of DAGs on parallel processors. In: Proceedings of international conference on parallel processing, vol II, pp 150–157
Google Scholar
Kwok Y-K, Ahmad I (1999) FASTEST: a practical low-complexity algorithm for compile-time assignment of parallel programs to multiprocessors. IEEE Trans Parallel Distrib Syst 10(2):147–159
Article Google Scholar
Lin F-T, Hsu C-C (1990) Task assignment scheduling by simulated annealing. In: IEEE region conference on computer and communication systems, Hong Kong, September 1990
Google Scholar
Matousek J, Gärtner B (2006) Understanding and using linear programming. Springer, Berlin. ISBN 978-3540306979
Google Scholar
Nanda AK, DeGroot D, Stenger DL (1992) Scheduling directed task graphs on multiprocessors using simulated annealing. In: Proceedings of 12th IEEE international conference on distributed systems, pp 20–27
Google Scholar
Orsila H (2011) Optimizing algorithms for task graph mapping on multiprocessor system on chip. Doctoral thesis, Tampere University of Technology, Department of Computer Systems. http://dspace.cc.tut.fi/dpub/handle/123456789/20519
Google Scholar
Orsila H, Kangas T, Salminen E, Hämäläinen TD (2006) Parameterizing simulated annealing for distributing task graphs on multiprocessor SoCs. In: International symposium on system-on-chip, Tampere, Finland, Nov 14–16, pp 73–76
Google Scholar
Orsila H, Kangas T, Salminen E, Hännikäinen M, Hämäläinen TD (2007) Automated memory-aware application distribution for multi-processor system-on-chips. J Syst Archit 53(11):795–815. ISSN 1383-7621
Article Google Scholar
Orsila H, Salminen E, Hännikäinen M, Hämäläinen TD (2007) Optimal subset mapping and convergence evaluation of mapping algorithms for distributing task graphs on multiprocessor SoC. In: International symposium on system-on-chip, Tampere, Finland, Nov 19–21, 2007
Google Scholar
Orsila H, Salminen E, Hämäläinen TD (2008) Best practices for simulated annealing in multiprocessor task distribution problems. In: Simulated annealing, pp 321–342. ISBN 978-953-7619-07-7. Chap. 16, I-Tech Education and Publishing KG
Google Scholar
Orsila H, Salminen E, Hämäläinen TD (2009) Parameterizing simulated annealing for distributing Kahn process networks on multiprocessor SoCs. In: International symposium on system-on-chip, Tampere, Finland, Oct 5–7, 2009
Google Scholar
Ravindran K (2007) Task allocation and scheduling of concurrent applications to multiprocessor systems. Doctoral thesis, UCB/EECS-2007-149. http://www.eecs.berkeley.edu/Pubs/TechRpts/2007/EECS-2007-149.html
SA+AT C reference implementation. http://zakalwe.fi/~shd/task-mapping
Sato M (2002) OpenMP: parallel programming API for shared memory multiprocessors and on-chip multiprocessors. In: Proceedings of the 15th international symposium on system synthesis, pp 109–111. ACM, New York
Chapter Google Scholar
Sih GC, Lee EA (1993) A compile-time scheduling heuristics for interconnection-constrained heterogeneous processor architectures. IEEE Trans Parallel Distrib Syst 4(2):175–187
Article Google Scholar
Wild T, Brunnbauer W, Foag J, Pazos N (2003) Mapping and scheduling for architecture exploration of networking SoCs. In: Proc. 16th int. conference on VLSI design, pp 376–381
Google Scholar
Wolf W (2004) The future of multiprocessor systems-on-chips. In: Design automation conference 2004, pp 681–685
Xu J, Hwang K (1990) A simulated annealing method for mapping production systems onto multicomputers. In: Proceedings of the sixth conference on artificial intelligence applications. IEEE Press, New York, pp 130–136. ISBN 0-8186-2032-3
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Tampere University of Technology, Tampere, Finland
Heikki Orsila, Erno Salminen & Timo Hämäläinen

Authors

Heikki Orsila
View author publications
You can also search for this author in PubMed Google Scholar
Erno Salminen
View author publications
You can also search for this author in PubMed Google Scholar
Timo Hämäläinen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Heikki Orsila.

Appendix: Convergence results to larger systems with 3–6 PEs

Table 13 Proportion of SA+AT runs that converged within p from global optimum for 3 PEs and 21 nodes. A higher value is better. SA+AT chooses L=42. The 90 % level is marked in boldface on each column

Full size table

Table 14 Approximate expected number of mappings for SA+AT with 3 PEs and 21 nodes. SA+AT chooses L=42. The best values (smallest) are in boldface for each performance level p (row)

Full size table

Table 15 Proportion of SA+AT runs that converged within p from global optimum for 4 PEs and 17 nodes. A higher value is better. SA+AT chooses L=51

Full size table

Table 16 Approximate expected number of mappings for SA+AT with 4 PEs and 17 nodes. SA+AT chooses L=51

Full size table

Table 17 Proportion of SA+AT runs that converged within p from global optimum for 6 PEs and 13 nodes. A higher value is better. SA+AT chooses L=65

Full size table

Table 18 Approximate expected number of mappings for SA+AT with 6 PEs and 13 nodes. SA+AT chooses L=65

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Orsila, H., Salminen, E. & Hämäläinen, T. Recommendations for using Simulated Annealing in task mapping. Des Autom Embed Syst 17, 53–85 (2013). https://doi.org/10.1007/s10617-013-9119-0

Download citation

Received: 02 July 2012
Accepted: 28 July 2013
Published: 12 September 2013
Issue Date: March 2013
DOI: https://doi.org/10.1007/s10617-013-9119-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Recommendations for using Simulated Annealing in task mapping

Abstract

Access this article

Similar content being viewed by others

A novel simulated annealing-based optimization approach for cluster-based task scheduling

A Heterogeneous Multi-core Network-on-Chip Mapping Optimization Algorithm

Domain-Knowledge Optimized Simulated Annealing for Network-on-Chip Application Mapping

References

Author information

Authors and Affiliations

Corresponding author

Appendix: Convergence results to larger systems with 3–6 PEs

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Recommendations for using Simulated Annealing in task mapping

Abstract

Access this article

Similar content being viewed by others

A novel simulated annealing-based optimization approach for cluster-based task scheduling

A Heterogeneous Multi-core Network-on-Chip Mapping Optimization Algorithm

Domain-Knowledge Optimized Simulated Annealing for Network-on-Chip Application Mapping

References

Author information

Authors and Affiliations

Corresponding author

Appendix: Convergence results to larger systems with 3–6 PEs

Appendix: Convergence results to larger systems with 3–6 PEs

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation