Abstract
A Multiprocessor System-on-Chip (MPSoC) may contain hundreds of processing elements (PEs) and thousands of tasks but design productivity is lagging the evolution of HW platforms. One problem is application task mapping, which tries to find a placement of tasks onto PEs which optimizes several criteria such as application runtime, intertask communication, memory usage, energy consumption, real-time constraints, as well as area in case that PE selection or buffer sizing are combined with the mapping procedure. Among optimization algorithms for the task mapping, we focus in this paper on Simulated Annealing (SA) heuristics. We present a literature survey and 5 general recommendations for reporting heuristics that should allow disciplined comparisons and reproduction by other researchers. Most importantly, we present our findings about SA parameter selection and 7 guidelines for obtaining a good trade-off made between solution quality and algorithm’s execution time. Notably, SA is compared against global optimum. Thorough experiments were performed with 2–8 PEs, 11–32 tasks, 10 graphs per system, and 1000 independent runs, totaling over 500 CPU days of computation. Results show that SA offers 4–6 orders of magnitude reduction is optimization time compared to brute force while achieving high quality solutions. In fact, the globally optimum solution was achieved with a 1.6—90 % probability when problem size is around 1e9–4e9 possibilities. There is approx. 90 % probability for finding a solution that is at most 18 % worse than optimum.
Similar content being viewed by others
References
Ali S, Kim J-K, Siegel HJ, Maciejewski AA (2008) Static heuristics for robust resource allocation of continuously executing applications. J Parallel Distrib Comput 68(8):1070–1080. ISSN 0743-7315. doi:10.1016/j.jpdc.2007.12.007
Bailey DH (1991) Twelve ways to fool the masses when giving performance results on parallel computers. Supercomput Rev 4(8):54–55. http://crd.lbl.gov/~dhbailey/dhbpapers/twelve-ways.pdf
Barr RS, Golden BL, Kelly JP, Resende MGC, Stewart WR (1995) Designing and reporting on computational experiments with heuristic methods. Springer J Heuristics 1(1):9–32
Bollinger SW, Midkiff SF (1991) Heuristic technique for processor and link assignment in multicomputers. IEEE Trans Comput 40:325–333
Braun TD, Siegel HJ, Beck N (2001) A comparison of eleven static heuristics for mapping a class if independent tasks onto heterogeneous distributed systems. IEEE J Parallel Distrib Comput 61:810–837
Coroyer C, Liu Z (1991) Effectiveness of heuristics and simulated annealing for the scheduling of concurrent tasks an empirical comparison. Rapport de recherche de l’INRIA Sophia Antipolis 1379
DCS task mapper (2010) A task mapping and scheduling tool for multiprocessor systems. http://wiki.tut.fi/DACI/DCSTaskMapper
This paper’s experiment data files (2012). http://zakalwe.fi/~shd/task-mapping/experiment-data-2012-11.tar.gz
Dorigo M, Stützle T (2004) Ant colony optimization. MIT Press, Cambridge. ISBN 0-262-04219-3
Ercal F, Ramanujam J, Sadayappan P (1988) Task allocation onto a hypercube by recursive mincut bipartitioning. ACM, New York, pp 210–221. http://dl.acm.org/citation.cfm?id=62323
Ferrandi F, Pilato C, Sciuto D, Tumeo A (2010) Mapping and scheduling of parallel C applications with ant colony optimization onto heterogeneous reconfigurable MPSoCs. In: Design automation conference (ASP-DAC), 15th, 2010, Asia and South Pacific, pp 799–804
Girkar M, Polychronopoulos CD (1992) Automatic extraction of functional parallelism from ordinary programs. IEEE Trans Parallel Distrib Syst 3(2):166–178
Gries M (2004) Methods for evaluating and covering the design space during early design development. Integr VLSI J 38(2):131–183
Jobqueue (2010) A tool for parallelizing jobs to a cluster of computers. http://zakalwe.fi/~shd/foss/jobqueue/
Kahn G (1974) The semantics of a simple language for parallel programming. In: Proceedings of IFIP Congress 74, information processing 74, pp 471–475. http://www1.cs.columbia.edu/~sedwards/papers/kahn1974semantics.pdf
Kim J-K, Shivle S, Siegel HJ, Maciejewski AA, Braun TD, Schneider M, Tideman S, Chitta R, Dilmaghani RB, Joshi R, Kaul A, Sharma A, Sripada S, Vangari P, Yellampalli SS (2007) Dynamically mapping tasks with priorities and multiple deadlines in a heterogeneous environment. J Parallel Distrib Comput Elsevier 67:154–169
Kirkpatrick S, Gelatt CD, Vecchi MP (1983) Optimization by simulated annealing. Science 200(4598):671–680
Koch P (1995) Strategies for realistic and efficient static scheduling of data independent algorithms onto multiple digital signal processors. Doctoral thesis, The DSP Research Group, Institute for Electronic Systems, Aalborg University, Aalborg, Denmark
kpn-generator (2009) A program for generating random Kahn process network graphs. http://zakalwe.fi/~shd/foss/kpn-generator/
Kwok Y-K, Ahmad I, Gu J (1996) FAST: a low-complexity algorithm for efficient scheduling of DAGs on parallel processors. In: Proceedings of international conference on parallel processing, vol II, pp 150–157
Kwok Y-K, Ahmad I (1999) FASTEST: a practical low-complexity algorithm for compile-time assignment of parallel programs to multiprocessors. IEEE Trans Parallel Distrib Syst 10(2):147–159
Lin F-T, Hsu C-C (1990) Task assignment scheduling by simulated annealing. In: IEEE region conference on computer and communication systems, Hong Kong, September 1990
Matousek J, Gärtner B (2006) Understanding and using linear programming. Springer, Berlin. ISBN 978-3540306979
Nanda AK, DeGroot D, Stenger DL (1992) Scheduling directed task graphs on multiprocessors using simulated annealing. In: Proceedings of 12th IEEE international conference on distributed systems, pp 20–27
Orsila H (2011) Optimizing algorithms for task graph mapping on multiprocessor system on chip. Doctoral thesis, Tampere University of Technology, Department of Computer Systems. http://dspace.cc.tut.fi/dpub/handle/123456789/20519
Orsila H, Kangas T, Salminen E, Hämäläinen TD (2006) Parameterizing simulated annealing for distributing task graphs on multiprocessor SoCs. In: International symposium on system-on-chip, Tampere, Finland, Nov 14–16, pp 73–76
Orsila H, Kangas T, Salminen E, Hännikäinen M, Hämäläinen TD (2007) Automated memory-aware application distribution for multi-processor system-on-chips. J Syst Archit 53(11):795–815. ISSN 1383-7621
Orsila H, Salminen E, Hännikäinen M, Hämäläinen TD (2007) Optimal subset mapping and convergence evaluation of mapping algorithms for distributing task graphs on multiprocessor SoC. In: International symposium on system-on-chip, Tampere, Finland, Nov 19–21, 2007
Orsila H, Salminen E, Hämäläinen TD (2008) Best practices for simulated annealing in multiprocessor task distribution problems. In: Simulated annealing, pp 321–342. ISBN 978-953-7619-07-7. Chap. 16, I-Tech Education and Publishing KG
Orsila H, Salminen E, Hämäläinen TD (2009) Parameterizing simulated annealing for distributing Kahn process networks on multiprocessor SoCs. In: International symposium on system-on-chip, Tampere, Finland, Oct 5–7, 2009
Ravindran K (2007) Task allocation and scheduling of concurrent applications to multiprocessor systems. Doctoral thesis, UCB/EECS-2007-149. http://www.eecs.berkeley.edu/Pubs/TechRpts/2007/EECS-2007-149.html
SA+AT C reference implementation. http://zakalwe.fi/~shd/task-mapping
Sato M (2002) OpenMP: parallel programming API for shared memory multiprocessors and on-chip multiprocessors. In: Proceedings of the 15th international symposium on system synthesis, pp 109–111. ACM, New York
Sih GC, Lee EA (1993) A compile-time scheduling heuristics for interconnection-constrained heterogeneous processor architectures. IEEE Trans Parallel Distrib Syst 4(2):175–187
Wild T, Brunnbauer W, Foag J, Pazos N (2003) Mapping and scheduling for architecture exploration of networking SoCs. In: Proc. 16th int. conference on VLSI design, pp 376–381
Wolf W (2004) The future of multiprocessor systems-on-chips. In: Design automation conference 2004, pp 681–685
Xu J, Hwang K (1990) A simulated annealing method for mapping production systems onto multicomputers. In: Proceedings of the sixth conference on artificial intelligence applications. IEEE Press, New York, pp 130–136. ISBN 0-8186-2032-3
Author information
Authors and Affiliations
Corresponding author
Appendix: Convergence results to larger systems with 3–6 PEs
Appendix: Convergence results to larger systems with 3–6 PEs
Rights and permissions
About this article
Cite this article
Orsila, H., Salminen, E. & Hämäläinen, T. Recommendations for using Simulated Annealing in task mapping. Des Autom Embed Syst 17, 53–85 (2013). https://doi.org/10.1007/s10617-013-9119-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10617-013-9119-0