Skip to main content

Mapping the Energy Landscape

  • Chapter
  • First Online:
Monte Carlo Methods
  • 4618 Accesses

Abstract

In many statistical learning problems, optimization is performed on a target function that is highly non-convex. A large body of research has been devoted to either approximating the target function by a related convex function, such as replacing the L 0 norm with the L 1 norm in regression models, or designing algorithms to find a good local optimum, such as the Expectation-Maximization algorithm for clustering. The task of analyzing the non-convex structure of a target function has received much less attention. In this chapter, inspired by successful visualization of landscapes for molecular systems [2] and spin-glass models [40], we compute Energy Landscape Maps (ELMs) in the high-dimensional spaces. The first half of the chapter explores and visualizes the model space (i.e. the hypothesis spaces in the machine learning literature) for clustering, bi-clustering, and grammar learning. The second half of the chapter introduces a novel MCMC method for identifying macroscopic structures in locally noisy energy landscapes. The technique is applied to explore the formation of stable concepts in deep network models of images.

“By visualizing information we turn it into a landscape that you can explore with your eyes: a sort of information map. And when you’re lost in information, an information map is kind of useful.”

– David McCandless

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 119.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Barbu A, Zhu S-C (2005) Generalizing Swendsen-wang to sampling arbitrary posterior probabilities. IEEE Trans Pattern Anal Mach Intell 27(8):1239–1253

    Article  Google Scholar 

  2. Becker OM, Karplus M (1997) The topology of multidimensional potential energy surfaces: theory and application to peptide structure and kinetics. J Chem Phys 106(4):1495–1517

    Article  Google Scholar 

  3. Bengio Y, Louradour J, Collobert R, Weston J (2009) Curriculum learning. In: ICML, pp 41–48

    Google Scholar 

  4. Blake CL, Merz CJ (1998) UCI repository of machine learning databases. University of California, Oakland. Robustness of maximum boxes

    Google Scholar 

  5. Bovier A, den Hollander F (2006) Metastability: a potential theoretic approach. Int Cong Math 3:499–518

    MathSciNet  MATH  Google Scholar 

  6. Brooks SP, Gelman A (1998) General methods for monitoring convergence of iterative simulations. J Comput Graph Stat 7(4):434–455

    MathSciNet  Google Scholar 

  7. Charniak E (2001) Immediate-head parsing for language models. In: Proceedings of the 39th annual meeting on association for computational linguistics, pp 124–131

    Google Scholar 

  8. Collins M (1999) Head-driven statistical models for natural language parsing. PhD thesis, University of Pennsylvania

    Google Scholar 

  9. Dasgupta S, Schulman LJ (2000) A two-round variant of em for gaussian mixtures. In: Proceedings of the sixteenth conference on uncertainty in artificial intelligence (UAI’00), pp 152–159

    Google Scholar 

  10. Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the em algorithm. J R Stat Soc Ser B (Methodol) 39(1):1–38

    MathSciNet  MATH  Google Scholar 

  11. Elman JL (1993) Learning and development in neural networks: the importance of starting small. Cognition 48(1):71–99

    Article  Google Scholar 

  12. Ganchev K, Graça J, Gillenwater J, Taskar B (2010) Posterior regularization for structured latent variable models. J Mach Learn Res 11:2001–2049

    MathSciNet  MATH  Google Scholar 

  13. Gelman A, Rubin DB (1992) Inference from iterative simulation using multiple sequences. Stat Sci 457–472

    Article  Google Scholar 

  14. Geyer CJ, Thompson EA (1995) Annealing Markov chain Monte Carlo with applications to ancestral inference. J Am Stat Assoc 90(431):909–920

    Article  Google Scholar 

  15. Headden WP III, Johnson M, McClosky D (2009) Improving unsupervised dependency parsing with richer contexts and smoothing. In: Proceedings of human language technologies: the 2009 annual conference of the North American chapter of the association for computational linguistics, pp 101–109

    Google Scholar 

  16. Hill M, Nijkamp E, Zhu S-C (2019) Building a telescope to look into high-dimensional image spaces. Q Appl Math 77(2):269–321

    Article  MathSciNet  Google Scholar 

  17. Julesz B (1962) Visual pattern discrimination. IRE Trans Inf Theory 8(2):84–92

    Article  Google Scholar 

  18. Julesz B (1981) Textons, the elements of texture perception, and their interactions. Nature 290:91

    Article  Google Scholar 

  19. Klein D, Manning CD (2004) Corpus-based induction of syntactic structure: models of dependency and constituency. In: Proceedings of the 42nd annual meeting on association for computational linguistics, p 478

    Google Scholar 

  20. Kübler S, McDonald R, Nivre J (2009) Dependency parsing. Synth Lect Hum Lang Technol 1(1):1–127

    Google Scholar 

  21. Liang F (2005) A generalized wang-landau algorithm for Monte Carlo computation. J Am Stat Assoc 100(472):1311–1327

    Article  MathSciNet  Google Scholar 

  22. Liang F, Liu C, Carroll RJ (2007) Stochastic approximation in Monte Carlo computation. J Am Stat Assoc 102(477):305–320

    Article  MathSciNet  Google Scholar 

  23. Marinari E, Parisi G (1992) Simulated tempering: a new Monte Carlo scheme. EPL (Europhys Lett) 19(6):451

    Article  Google Scholar 

  24. Mel’čuk IA (1988) Dependency syntax: theory and practice. SUNY Press, New York

    Google Scholar 

  25. Onuchic JN, Luthey-Schulten Z, Wolynes PG (1997) Theory of protein folding: the energy landscape perspective. Ann Rev Phys Chem 48(1):545–600

    Article  Google Scholar 

  26. Pavlovskaia M (2014) Mapping highly nonconvex energy landscapes in clustering, grammatical and curriculum learning. PhD thesis, Doctoral Dissertation, UCLA

    Google Scholar 

  27. Pavlovskaia M, Tu K, Zhu S-C (2015) Mapping the energy landscape of non-convex optimization problems. In: International workshop on energy minimization methods in computer vision and pattern recognition. Springer, pp 421–435

    Google Scholar 

  28. Rohde DLT, Plaut DC (1999) Language acquisition in the absence of explicit negative evidence: how important is starting small? Cognition 72(1):67–109

    Article  Google Scholar 

  29. Samdani R, Chang M-W, Roth D (2012) Unified expectation maximization. In: Proceedings of the 2012 conference of the North American chapter of the association for computational linguistics: human language technologies. Association for Computational Linguistics, pp 688–698

    Google Scholar 

  30. Spitkovsky VI, Alshawi H, Jurafsky D (2010) From baby steps to leapfrog: how “less is more” in unsupervised dependency parsing. In: NAACL

    Google Scholar 

  31. Swendsen RH, Wang J-S (1987) Nonuniversal critical dynamics in Monte Carlo simulations. Phys Rev Lett 58(2):86–88

    Article  Google Scholar 

  32. Tu K, Honavar V (2011) On the utility of curricula in unsupervised learning of probabilistic grammars. In: IJCAI proceedings-international joint conference on artificial intelligence, vol 22, p 1523

    Google Scholar 

  33. Tu K, Honavar V (2012) Unambiguity regularization for unsupervised learning of probabilistic grammars. In: Proceedings of the 2012 conference on empirical methods in natural language processing and natural language learning (EMNLP-CoNLL 2012)

    Google Scholar 

  34. Wales DJ, Doye JPK (1997) Global optimization by basin-hopping and the lowest energy structures of lennard-jones clusters containing up to 110 atoms. J Phys Chem 101(28):5111–5116

    Article  Google Scholar 

  35. Wales DJ, Trygubenko SA (2004) A doubly nudged elastic band method for finding transition states. J Chem Phy 120:2082–2094

    Article  Google Scholar 

  36. Wang F, Landau DP (2001) Efficient, multiple-range random walk algorithm to calculate the density of states. Phys Rev Lett 86(10):2050

    Article  Google Scholar 

  37. Wu YN, Guo C-E, Zhu S-C (2007) From information scaling of natural images to regimes of statistical models. Q Appl Math 66(1):81–122

    Article  MathSciNet  Google Scholar 

  38. Xie J, Lu Y, Wu YN (2018) Cooperative learning of energy-based model and latent variable model via MCMC teaching. In: AAAI

    Google Scholar 

  39. Zhou Q (2011) Multi-domain sampling with applications to structural inference of Bayesian networks. J Am Stat Assoc 106(496):1317–1330

    Article  MathSciNet  Google Scholar 

  40. Zhou Q (2011) Random walk over basins of attraction to construct Ising energy landscapes. Phys Rev Lett 106(18):180602

    Article  Google Scholar 

  41. Zhou Q, Wong WH (2008) Reconstructing the energy landscape of a distribution from Monte Carlo samples. Ann Appl Stat 2:1307–1331

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Singapore Pte Ltd.

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Barbu, A., Zhu, SC. (2020). Mapping the Energy Landscape. In: Monte Carlo Methods. Springer, Singapore. https://doi.org/10.1007/978-981-13-2971-5_11

Download citation

Publish with us

Policies and ethics