Abstract
In many statistical learning problems, optimization is performed on a target function that is highly non-convex. A large body of research has been devoted to either approximating the target function by a related convex function, such as replacing the L 0 norm with the L 1 norm in regression models, or designing algorithms to find a good local optimum, such as the Expectation-Maximization algorithm for clustering. The task of analyzing the non-convex structure of a target function has received much less attention. In this chapter, inspired by successful visualization of landscapes for molecular systems [2] and spin-glass models [40], we compute Energy Landscape Maps (ELMs) in the high-dimensional spaces. The first half of the chapter explores and visualizes the model space (i.e. the hypothesis spaces in the machine learning literature) for clustering, bi-clustering, and grammar learning. The second half of the chapter introduces a novel MCMC method for identifying macroscopic structures in locally noisy energy landscapes. The technique is applied to explore the formation of stable concepts in deep network models of images.
“By visualizing information we turn it into a landscape that you can explore with your eyes: a sort of information map. And when you’re lost in information, an information map is kind of useful.”
– David McCandless
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Barbu A, Zhu S-C (2005) Generalizing Swendsen-wang to sampling arbitrary posterior probabilities. IEEE Trans Pattern Anal Mach Intell 27(8):1239–1253
Becker OM, Karplus M (1997) The topology of multidimensional potential energy surfaces: theory and application to peptide structure and kinetics. J Chem Phys 106(4):1495–1517
Bengio Y, Louradour J, Collobert R, Weston J (2009) Curriculum learning. In: ICML, pp 41–48
Blake CL, Merz CJ (1998) UCI repository of machine learning databases. University of California, Oakland. Robustness of maximum boxes
Bovier A, den Hollander F (2006) Metastability: a potential theoretic approach. Int Cong Math 3:499–518
Brooks SP, Gelman A (1998) General methods for monitoring convergence of iterative simulations. J Comput Graph Stat 7(4):434–455
Charniak E (2001) Immediate-head parsing for language models. In: Proceedings of the 39th annual meeting on association for computational linguistics, pp 124–131
Collins M (1999) Head-driven statistical models for natural language parsing. PhD thesis, University of Pennsylvania
Dasgupta S, Schulman LJ (2000) A two-round variant of em for gaussian mixtures. In: Proceedings of the sixteenth conference on uncertainty in artificial intelligence (UAI’00), pp 152–159
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the em algorithm. J R Stat Soc Ser B (Methodol) 39(1):1–38
Elman JL (1993) Learning and development in neural networks: the importance of starting small. Cognition 48(1):71–99
Ganchev K, Graça J, Gillenwater J, Taskar B (2010) Posterior regularization for structured latent variable models. J Mach Learn Res 11:2001–2049
Gelman A, Rubin DB (1992) Inference from iterative simulation using multiple sequences. Stat Sci 457–472
Geyer CJ, Thompson EA (1995) Annealing Markov chain Monte Carlo with applications to ancestral inference. J Am Stat Assoc 90(431):909–920
Headden WP III, Johnson M, McClosky D (2009) Improving unsupervised dependency parsing with richer contexts and smoothing. In: Proceedings of human language technologies: the 2009 annual conference of the North American chapter of the association for computational linguistics, pp 101–109
Hill M, Nijkamp E, Zhu S-C (2019) Building a telescope to look into high-dimensional image spaces. Q Appl Math 77(2):269–321
Julesz B (1962) Visual pattern discrimination. IRE Trans Inf Theory 8(2):84–92
Julesz B (1981) Textons, the elements of texture perception, and their interactions. Nature 290:91
Klein D, Manning CD (2004) Corpus-based induction of syntactic structure: models of dependency and constituency. In: Proceedings of the 42nd annual meeting on association for computational linguistics, p 478
Kübler S, McDonald R, Nivre J (2009) Dependency parsing. Synth Lect Hum Lang Technol 1(1):1–127
Liang F (2005) A generalized wang-landau algorithm for Monte Carlo computation. J Am Stat Assoc 100(472):1311–1327
Liang F, Liu C, Carroll RJ (2007) Stochastic approximation in Monte Carlo computation. J Am Stat Assoc 102(477):305–320
Marinari E, Parisi G (1992) Simulated tempering: a new Monte Carlo scheme. EPL (Europhys Lett) 19(6):451
Mel’čuk IA (1988) Dependency syntax: theory and practice. SUNY Press, New York
Onuchic JN, Luthey-Schulten Z, Wolynes PG (1997) Theory of protein folding: the energy landscape perspective. Ann Rev Phys Chem 48(1):545–600
Pavlovskaia M (2014) Mapping highly nonconvex energy landscapes in clustering, grammatical and curriculum learning. PhD thesis, Doctoral Dissertation, UCLA
Pavlovskaia M, Tu K, Zhu S-C (2015) Mapping the energy landscape of non-convex optimization problems. In: International workshop on energy minimization methods in computer vision and pattern recognition. Springer, pp 421–435
Rohde DLT, Plaut DC (1999) Language acquisition in the absence of explicit negative evidence: how important is starting small? Cognition 72(1):67–109
Samdani R, Chang M-W, Roth D (2012) Unified expectation maximization. In: Proceedings of the 2012 conference of the North American chapter of the association for computational linguistics: human language technologies. Association for Computational Linguistics, pp 688–698
Spitkovsky VI, Alshawi H, Jurafsky D (2010) From baby steps to leapfrog: how “less is more” in unsupervised dependency parsing. In: NAACL
Swendsen RH, Wang J-S (1987) Nonuniversal critical dynamics in Monte Carlo simulations. Phys Rev Lett 58(2):86–88
Tu K, Honavar V (2011) On the utility of curricula in unsupervised learning of probabilistic grammars. In: IJCAI proceedings-international joint conference on artificial intelligence, vol 22, p 1523
Tu K, Honavar V (2012) Unambiguity regularization for unsupervised learning of probabilistic grammars. In: Proceedings of the 2012 conference on empirical methods in natural language processing and natural language learning (EMNLP-CoNLL 2012)
Wales DJ, Doye JPK (1997) Global optimization by basin-hopping and the lowest energy structures of lennard-jones clusters containing up to 110 atoms. J Phys Chem 101(28):5111–5116
Wales DJ, Trygubenko SA (2004) A doubly nudged elastic band method for finding transition states. J Chem Phy 120:2082–2094
Wang F, Landau DP (2001) Efficient, multiple-range random walk algorithm to calculate the density of states. Phys Rev Lett 86(10):2050
Wu YN, Guo C-E, Zhu S-C (2007) From information scaling of natural images to regimes of statistical models. Q Appl Math 66(1):81–122
Xie J, Lu Y, Wu YN (2018) Cooperative learning of energy-based model and latent variable model via MCMC teaching. In: AAAI
Zhou Q (2011) Multi-domain sampling with applications to structural inference of Bayesian networks. J Am Stat Assoc 106(496):1317–1330
Zhou Q (2011) Random walk over basins of attraction to construct Ising energy landscapes. Phys Rev Lett 106(18):180602
Zhou Q, Wong WH (2008) Reconstructing the energy landscape of a distribution from Monte Carlo samples. Ann Appl Stat 2:1307–1331
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Singapore Pte Ltd.
About this chapter
Cite this chapter
Barbu, A., Zhu, SC. (2020). Mapping the Energy Landscape. In: Monte Carlo Methods. Springer, Singapore. https://doi.org/10.1007/978-981-13-2971-5_11
Download citation
DOI: https://doi.org/10.1007/978-981-13-2971-5_11
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-2970-8
Online ISBN: 978-981-13-2971-5
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)