Which Hype for My New Task? Hints and Random Search for Echo State Networks Hyperparameters

Hinaut, Xavier; Trouvain, Nathan

doi:10.1007/978-3-030-86383-8_7

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12895))

Included in the following conference series:

International Conference on Artificial Neural Networks

2060 Accesses
4 Citations

Abstract

In learning systems, hyperparameters are parameters that are not learned but need to be set a priori. In Reservoir Computing, there are several parameters that needs to be set a priori depending on the task. Newcomers to Reservoir Computing cannot have a good intuition on which hyperparameters to tune and how to tune them. For instance, beginners often explore the reservoir sparsity, but in practice this parameter is not of high influence on performance for ESNs. Most importantly, many authors keep doing suboptimal hyperparameter searches: using grid search as a tool to explore more than two hyperparameters, while restraining the spectral radius to be below unity. In this short paper, we give some suggestions, intuitions, and give a general method to find robust hyperparameters while understanding their influence on performance. We also provide a graphical interface (included in ReservoirPy) in order to make this hyperparameter search more intuitive. Finally, we discuss some potential refinements of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
We call evaluations the training of instances of reservoirs for sets of HP values.
2.
https://hal.inria.fr/hal-03203318/document.
3.
A figure summarizing the results of this first random search on the Lorenz time series prediction task is available here: https://hal.inria.fr/hal-03203318/document.
4.
W proba and $W_{in}$ proba have little influence on the error distribution (diagonal plots) and no linear dependence with the other parameters can be seen on figure available here: https://hal.inria.fr/hal-03203318/document.
5.
With grid search, by exploring v values for each of p HPs, one needs to perform $p^v$ evaluations. While for random search, v is equal to the total number of evaluations.
6.
For interdependency for IS vs. ridge see https://hal.inria.fr/hal-03203318/document.
7.
Of course as we plot many variables with log scales, the equation would often look like $log(Y) = a.log(X) + b$.

References

Ferreira, A., et al.: An approach to reservoir computing design and training. Expert Syst. Appl. 40(10), 4172–4182 (2013)
Google Scholar
Schrauwen, B., et al.: An overview of reservoir computing: theory, applications and implementations. In: Proceedings of ESANN, pp. 471–482 (2007)
Google Scholar
Jaeger, H., et al.: Optimization and applications of echo state networks with leaky-integrator neurons. Neural Netw. 20(3), 335–352 (2007)
Google Scholar
Bergstra, J., et al.: Hyperopt: a python library for optimizing the hyperparameters of machine learning algorithms. In: Proceedings SciPy, pp. 13–20. Citeseer (2013)
Google Scholar
Trouvain, N., Pedrelli, L., Dinh, T.T., Hinaut, X.: ReservoirPy: an efficient and user-friendly library to design echo state networks. In: Farkaš, I., Masulli, P., Wermter, S. (eds.) ICANN 2020. LNCS, vol. 12397, pp. 494–505. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-61616-8_40
Chapter Google Scholar
Bergstra, J., Bengio, Y.: Random search for hyper-parameter optimization. J. Mach. Learn. Res. 13, 281–305 (2012)
Google Scholar
Hinaut, X.: Which input abstraction is better for a robot syntax acquisition model? phonemes, words or grammatical constructions? In: ICDL-EpiRob (2018)
Google Scholar
Jaeger, H.: The “echo state” approach to analysing and training recurrent neural networks. GNRCIT GMD Techmical report, Bonn, Germany. 148, 34 (2001)
Google Scholar
Langton, C.G.: Computation at the edge of chaos: phase transitions and emergent computation. Physica D 42(1–3), 12–37 (1990)
Article MathSciNet Google Scholar
Legenstein, R., Maass, W.: Edge of chaos and prediction of computational performance for neural circuit models. Neural Netw. 20(3), 323–334 (2007)
Article Google Scholar
Lorenz, E.N.: Deterministic nonperiodic flow. J. Atmo. Sci. 20(2), 130–141 (1963)
Article MathSciNet Google Scholar
Lukoševičius, M.: A practical guide to applying echo state networks. In: Montavon, G., Orr, G.B., Müller, K.-R. (eds.) Neural Networks: Tricks of the Trade. LNCS, vol. 7700, pp. 659–686. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-35289-8_36
Chapter Google Scholar
Lukoševičius, M., Jaeger, H.: Reservoir computing approaches to recurrent neural network training. Comput. Sci. Rev. 3(3), 127–149 (2009)
Article Google Scholar
Mackey, M.C., Glass, L.: Oscillation and chaos in physiological control systems. Science 197(4300), 287–289 (1977)
Article Google Scholar
McInnes, L., Healy, J., Saul, N., Großberger, L.: UMAP: uniform manifold approximation and projection. J. Open Source Softw. 3(29), 861 (2018)
Article Google Scholar
Variengien, A., Hinaut, X.: A journey in ESN and LSTM visualisations on a language task. arXiv preprint arXiv:2012.01748 (2020)
Yperman, J., Becker, T.: Bayesian optimization of hyper-parameters in reservoir computing. arXiv preprint arXiv:1611.05193 (2016)

Download references

Author information

Authors and Affiliations

INRIA Bordeaux Sud-Ouest, Talence, France
Xavier Hinaut & Nathan Trouvain
LaBRI, Bordeaux INP, CNRS, UMR 5800, Talence, France
Xavier Hinaut & Nathan Trouvain
Institut des Maladies Neurodégénératives, Université de Bordeaux, CNRS, UMR 5293, Bordeaux, France
Xavier Hinaut & Nathan Trouvain

Authors

Xavier Hinaut
View author publications
You can also search for this author in PubMed Google Scholar
Nathan Trouvain
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xavier Hinaut .

Editor information

Editors and Affiliations

Comenius University in Bratislava, Bratislava, Slovakia
Igor Farkaš
iMotions A/S, Copenhagen, Denmark
Paolo Masulli
University of Tübingen, Tübingen, Baden-Württemberg, Germany
Sebastian Otte
Universität Hamburg, Hamburg, Germany
Stefan Wermter

5 Supplementary Material

1.1 5.1 Implementation details

We used the ReservoirPy library [5], which has been interfaced with hyperopt [4], to make the following experiments and figures in order to dive into the exploration of HPs. We illustrated our proposed method with two different tasks of chaotic time series prediction. These tasks consist in predicting the value of the series at time step $t+1$ given its value at time step t. To asses the performance of our models on these tasks, we performed cross validation. Each fold is composed of a train and a validation data set. The folds are defined as continuous slices of the original series, with an overlap in time, e.g. the validation set of the first fold would be the training set of the second, and the two sets would be two adjacent sequences of data in time in the series. For the last fold of the time series, the train set is defined as the last available slice of data, while the train set of the first fold is used as validation set.

For all tasks, we used a 3-fold cross validation measure on a time series composed of 6000 time steps, i.e. each fold is composed of 4000 time steps, with a train set and a validation set of 2000 time steps each. We used two metrics to perform this measure: the Normalized Root Mean Squared Error, defined in Eq. 1, and the $R^2$ correlation coefficient, defined in Eq. 2:

$$\begin{aligned} \mathrm {NRMSE}(y, \hat{y}) = \frac{ \sqrt{ \frac{1}{N}\sum _{t=0}^{N-1} (y_t - \hat{y}_t)^2 }}{\max y - \min y} \end{aligned}$$

(1)

$$\begin{aligned} R^2(y, \hat{y}) = 1 - \frac{\sum _{t=0}^{N-1}(y_t - \hat{y}_t)^2}{\sum _{t=0}^{N-1}(y_t - \bar{y})^2} \end{aligned}$$

(2)

where y is a time series defined over N time steps, $\hat{y}$ is the estimated time series predicted by the model, and $\bar{y}$ is the average value of the time series y over time. NRMSE was used as an error measure, which we expect to reach a value near 0, while $R^2$ was used as a score, which we expect to reach 1, its maximum possible value. All measures were made by averaging this two metrics across all folds, with 5 different initializations of the models for each fold (Figs. 5 and 6).

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hinaut, X., Trouvain, N. (2021). Which Hype for My New Task? Hints and Random Search for Echo State Networks Hyperparameters. In: Farkaš, I., Masulli, P., Otte, S., Wermter, S. (eds) Artificial Neural Networks and Machine Learning – ICANN 2021. ICANN 2021. Lecture Notes in Computer Science(), vol 12895. Springer, Cham. https://doi.org/10.1007/978-3-030-86383-8_7

Download citation

DOI: https://doi.org/10.1007/978-3-030-86383-8_7
Published: 07 September 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86382-1
Online ISBN: 978-3-030-86383-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Which Hype for My New Task? Hints and Random Search for Echo State Networks Hyperparameters

Abstract

Access this chapter

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

5 Supplementary Material

5 Supplementary Material

1.1 5.1 Implementation details

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation