Skip to main content

Which Hype for My New Task? Hints and Random Search for Echo State Networks Hyperparameters

  • Conference paper
  • First Online:
Artificial Neural Networks and Machine Learning – ICANN 2021 (ICANN 2021)

Abstract

In learning systems, hyperparameters are parameters that are not learned but need to be set a priori. In Reservoir Computing, there are several parameters that needs to be set a priori depending on the task. Newcomers to Reservoir Computing cannot have a good intuition on which hyperparameters to tune and how to tune them. For instance, beginners often explore the reservoir sparsity, but in practice this parameter is not of high influence on performance for ESNs. Most importantly, many authors keep doing suboptimal hyperparameter searches: using grid search as a tool to explore more than two hyperparameters, while restraining the spectral radius to be below unity. In this short paper, we give some suggestions, intuitions, and give a general method to find robust hyperparameters while understanding their influence on performance. We also provide a graphical interface (included in ReservoirPy) in order to make this hyperparameter search more intuitive. Finally, we discuss some potential refinements of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    We call evaluations the training of instances of reservoirs for sets of HP values.

  2. 2.

    https://hal.inria.fr/hal-03203318/document.

  3. 3.

    A figure summarizing the results of this first random search on the Lorenz time series prediction task is available here: https://hal.inria.fr/hal-03203318/document.

  4. 4.

    W proba and \(W_{in}\) proba have little influence on the error distribution (diagonal plots) and no linear dependence with the other parameters can be seen on figure available here: https://hal.inria.fr/hal-03203318/document.

  5. 5.

    With grid search, by exploring v values for each of p HPs, one needs to perform \(p^v\) evaluations. While for random search, v is equal to the total number of evaluations.

  6. 6.

    For interdependency for IS vs. ridge see https://hal.inria.fr/hal-03203318/document.

  7. 7.

    Of course as we plot many variables with log scales, the equation would often look like \(log(Y) = a.log(X) + b\).

References

  1. Ferreira, A., et al.: An approach to reservoir computing design and training. Expert Syst. Appl. 40(10), 4172–4182 (2013)

    Google Scholar 

  2. Schrauwen, B., et al.: An overview of reservoir computing: theory, applications and implementations. In: Proceedings of ESANN, pp. 471–482 (2007)

    Google Scholar 

  3. Jaeger, H., et al.: Optimization and applications of echo state networks with leaky-integrator neurons. Neural Netw. 20(3), 335–352 (2007)

    Google Scholar 

  4. Bergstra, J., et al.: Hyperopt: a python library for optimizing the hyperparameters of machine learning algorithms. In: Proceedings SciPy, pp. 13–20. Citeseer (2013)

    Google Scholar 

  5. Trouvain, N., Pedrelli, L., Dinh, T.T., Hinaut, X.: ReservoirPy: an efficient and user-friendly library to design echo state networks. In: Farkaš, I., Masulli, P., Wermter, S. (eds.) ICANN 2020. LNCS, vol. 12397, pp. 494–505. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-61616-8_40

    Chapter  Google Scholar 

  6. Bergstra, J., Bengio, Y.: Random search for hyper-parameter optimization. J. Mach. Learn. Res. 13, 281–305 (2012)

    Google Scholar 

  7. Hinaut, X.: Which input abstraction is better for a robot syntax acquisition model? phonemes, words or grammatical constructions? In: ICDL-EpiRob (2018)

    Google Scholar 

  8. Jaeger, H.: The “echo state” approach to analysing and training recurrent neural networks. GNRCIT GMD Techmical report, Bonn, Germany. 148, 34 (2001)

    Google Scholar 

  9. Langton, C.G.: Computation at the edge of chaos: phase transitions and emergent computation. Physica D 42(1–3), 12–37 (1990)

    Article  MathSciNet  Google Scholar 

  10. Legenstein, R., Maass, W.: Edge of chaos and prediction of computational performance for neural circuit models. Neural Netw. 20(3), 323–334 (2007)

    Article  Google Scholar 

  11. Lorenz, E.N.: Deterministic nonperiodic flow. J. Atmo. Sci. 20(2), 130–141 (1963)

    Article  MathSciNet  Google Scholar 

  12. Lukoševičius, M.: A practical guide to applying echo state networks. In: Montavon, G., Orr, G.B., Müller, K.-R. (eds.) Neural Networks: Tricks of the Trade. LNCS, vol. 7700, pp. 659–686. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-35289-8_36

    Chapter  Google Scholar 

  13. Lukoševičius, M., Jaeger, H.: Reservoir computing approaches to recurrent neural network training. Comput. Sci. Rev. 3(3), 127–149 (2009)

    Article  Google Scholar 

  14. Mackey, M.C., Glass, L.: Oscillation and chaos in physiological control systems. Science 197(4300), 287–289 (1977)

    Article  Google Scholar 

  15. McInnes, L., Healy, J., Saul, N., Großberger, L.: UMAP: uniform manifold approximation and projection. J. Open Source Softw. 3(29), 861 (2018)

    Article  Google Scholar 

  16. Variengien, A., Hinaut, X.: A journey in ESN and LSTM visualisations on a language task. arXiv preprint arXiv:2012.01748 (2020)

  17. Yperman, J., Becker, T.: Bayesian optimization of hyper-parameters in reservoir computing. arXiv preprint arXiv:1611.05193 (2016)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xavier Hinaut .

Editor information

Editors and Affiliations

5 Supplementary Material

5 Supplementary Material

1.1 5.1 Implementation details

We used the ReservoirPy library [5], which has been interfaced with hyperopt [4], to make the following experiments and figures in order to dive into the exploration of HPs. We illustrated our proposed method with two different tasks of chaotic time series prediction. These tasks consist in predicting the value of the series at time step \(t+1\) given its value at time step t. To asses the performance of our models on these tasks, we performed cross validation. Each fold is composed of a train and a validation data set. The folds are defined as continuous slices of the original series, with an overlap in time, e.g. the validation set of the first fold would be the training set of the second, and the two sets would be two adjacent sequences of data in time in the series. For the last fold of the time series, the train set is defined as the last available slice of data, while the train set of the first fold is used as validation set.

For all tasks, we used a 3-fold cross validation measure on a time series composed of 6000 time steps, i.e. each fold is composed of 4000 time steps, with a train set and a validation set of 2000 time steps each. We used two metrics to perform this measure: the Normalized Root Mean Squared Error, defined in Eq. 1, and the \(R^2\) correlation coefficient, defined in Eq. 2:

$$\begin{aligned} \mathrm {NRMSE}(y, \hat{y}) = \frac{ \sqrt{ \frac{1}{N}\sum _{t=0}^{N-1} (y_t - \hat{y}_t)^2 }}{\max y - \min y} \end{aligned}$$
(1)
$$\begin{aligned} R^2(y, \hat{y}) = 1 - \frac{\sum _{t=0}^{N-1}(y_t - \hat{y}_t)^2}{\sum _{t=0}^{N-1}(y_t - \bar{y})^2} \end{aligned}$$
(2)

where y is a time series defined over N time steps, \(\hat{y}\) is the estimated time series predicted by the model, and \(\bar{y}\) is the average value of the time series y over time. NRMSE was used as an error measure, which we expect to reach a value near 0, while \(R^2\) was used as a score, which we expect to reach 1, its maximum possible value. All measures were made by averaging this two metrics across all folds, with 5 different initializations of the models for each fold (Figs. 5 and 6).

Fig. 5.
figure 5

Restricted search on the IS parameter, for the Mackey-Glass task. The fixed value of 1 defined at the beginning of the search for IS might not be the optimal value given all the other parameters chosen. In the case of the Mackey-Glass task, we can clearly see in the top-left and bottom-left plots that better results are achieved with lower values, with the top 10% trials distribution being placed around a median of 0.3.

Fig. 6.
figure 6

Large search dependence plot for Lorenz task, with 125 trials. Diagonal of the plot matrix displays interactions between all parameters explored and the error value, also referred to as loss value. Top 10% trials in terms of score (here, \(R^{2}\)) are represented using colors from yellow (top 10) to red (top 1) in all plots. Other plots on the matrix display the interactions of all possible couple of parameters. In these plots, the value of the error is represented using shades of purple and blue, on a logarithmic scale, while score is represented using different circle sizes. Circle size is normalized regarding to the score values. Because \(R^{2}\) can take values between \(-\infty \) and 1, the smallest dots represents negative values. Finally, the bottom row of plots display the parameter distribution of the top 10% of trials, in terms of score, and for each parameter.

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Hinaut, X., Trouvain, N. (2021). Which Hype for My New Task? Hints and Random Search for Echo State Networks Hyperparameters. In: Farkaš, I., Masulli, P., Otte, S., Wermter, S. (eds) Artificial Neural Networks and Machine Learning – ICANN 2021. ICANN 2021. Lecture Notes in Computer Science(), vol 12895. Springer, Cham. https://doi.org/10.1007/978-3-030-86383-8_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-86383-8_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-86382-1

  • Online ISBN: 978-3-030-86383-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics