Prior Distribution Selection for a Mixture of Experts

Grabovoy, A. V.; Strijov, V. V.

doi:10.1134/S0965542521070071

Prior Distribution Selection for a Mixture of Experts

INFORMATION SCIENCE
Published: 22 August 2021

Volume 61, pages 1140–1152, (2021)
Cite this article

Computational Mathematics and Mathematical Physics Aims and scope Submit manuscript

A. V. Grabovoy¹ &
V. V. Strijov²

105 Accesses
1 Citation
Explore all metrics

Abstract

The paper investigates a mixture of expert models. The mixture of experts is a combination of experts, local approximation model, and a gate function, which weighs these experts and forms their ensemble. In this work, each expert is a linear model. The gate function is a neural network with softmax on the last layer. The paper analyzes various prior distributions for each expert. The authors propose a method that takes into account the relationship between prior distributions of different experts. The EM algorithm optimises both parameters of the local models and parameters of the gate function. As an application problem, the paper solves a problem of shape recognition on images. Each expert fits one circle in an image and recovers its parameters: the coordinates of the center and the radius. The computational experiment uses synthetic and real data to test the proposed method. The real data is a human eye image from the iris detection problem.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 5.

Fig. 6.

Fig. 7.

A flexible probabilistic framework for large-margin mixture of experts

Article 05 June 2019

Quantum-like Gaussian mixture model

Article 11 June 2021

An effective strategy for initializing the EM algorithm in finite mixture models

Article 03 August 2016

REFERENCES

C. Tianqi and G. Carlos, “XGBoost: A scalable tree boosting system,” in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2016).
C. Xi and I. Hemant, “Random forests for genomic data analysis,” Genomics 99 (6), 323–329 (2012).
Article Google Scholar
Y. S. Esen, J. Wilson, and P. D. Gader, “Twenty years of mixture of experts,” IEEE Trans. Neural Networks Learn. Syst. 23 (8), 1177–1193 (2012).
Article Google Scholar
C. E. Rasmussen and Z. Ghahramani, “Infinite mixtures of Gaussian process experts,” in NIPS'01: Proceedings of the 14th International Conference on Neural Information Processing Systems: Natural and Synthetic (2002), pp. 881–888.
N. Shazeer, A. Mirhoseini, and K. Maziarz, “Outrageously large neural networks: The sparsely-gated mixture-of-experts layer,” International Conference on Learning Representations (2017).
M. I. Jordan, “Hierarchical mixtures of experts and the EM algorithm,” Neural Comput. 6 (2), 181–214 (1994).
Article Google Scholar
M. I. Jordan and R. A. Jacobs, “Hierarchies of adaptive experts,” in NIPS'91: Proceedings of the 4th International Conference on Neural Information Processing Systems (1991), pp. 985–992.
C. Lima, A. Coelho, and F. J. Zuben, “Hybridizing mixtures of experts with support vector machines: Investigation into nonlinear dynamic systems identification,” Inf. Sci. 177 (10), 2049–2074 (2007).
Article Google Scholar
L. Cao, “Support vector machines experts for time series forecasting,” Neurocomputing 51, 321–339 (2003).
Article Google Scholar
M. S. Yumlu, F. S. Gurgen, and N. Okay, “Financial time series prediction using mixture of experts,” in Proceedings of the 18th International Symposium on Computer and Information Sciences (Springer, Berlin, 2003), pp. 553–560.
Y. M. Cheung, W. M. Leung, and L. Xu, “Application of mixture of experts model to financial time series forecasting,” in Proceedings of the International Conference on Neural Networks and Signal Processing (1995), pp. 1–4.
A. S. Weigend and S. Shi, “Predicting daily probability distributions of S&P500 returns,” J. Forecast. 19 (4), 375–392 (2000).
Article Google Scholar
R. Ebrahimpour, M. R. Moradian, A. Esmkhani, and F. M. Jafarlou, “Recognition of Persian handwritten digits using characterization loci and mixture of experts,” J. Digital Content Technol. Appl. 3 (3), 42–46 (2009).
Google Scholar
A. Estabrooks and N. Japkowicz, “A mixture-of-experts framework for text classification,” in Proceedings of Workshop on Computational Natural Language Learning (ACT, 2001), pp. 1–8.
S. Mossavat, O. Amft, B. Petkov Vries, and W. Kleijn, “A Bayesian hierarchical mixture of experts approach to estimate speech quality,” in Proceedings of the 2nd International Workshop on Quality of Multimedia Experience (2010), pp. 200–205.
F. Peng, R. A. Jacobs, and M. A. Tanner, “Bayesian inference in mixtures-of-experts and hierarchical mixtures-of-experts models with an application to speech recognition,” J. Am. Stat. Assoc. 91 (435), 953–960 (1996).
Article Google Scholar
A. Tuerk, PhD Thesis (Cambridge Univ., Cambridge, 2001).
C. Sminchisescu, A. Kanaujia, and D. Metaxas, “Discriminative density propagation for visual tracking,” IEEE Trans. Pattern Anal. Mach. Intell. 29 (11), 2030–2044 (2007).
Article Google Scholar
K. Bowyer, K. Hollingsworth, and P. Flynn, A Survey of Iris Biometrics Research: 2008–2010.
I. Matveev, “Detection of iris in image by interrelated maxima of brightness gradient projections,” Appl. Comput. Math. 9 (2), 252–257 (2010).
MATH Google Scholar
I. Matveev and I. Simonenko, “Detecting precise iris boundaries by circular shortest path method,” Pattern Recogn. Image Anal. 24, 304–309 (2014).
Article Google Scholar
A. P. Dempster, N. M. Laird, and D. B. Rubin, “Maximum likelihood from incomplete data via the EM algorithm,” J. R. Stat. Soc. Ser. B (Methodol.) 39 (1), 1–38 (1977).
MathSciNet MATH Google Scholar
C. Bishop, Pattern Recognition and Machine Learning (Springer, Berlin, 2006).
MATH Google Scholar

Download references

Funding

This research was supported by the Russian Foundation for Basic Research (project nos. 19-07-01155, 19-07-0875) and by NTI (project “Mathematical methods of big data analysis” 13/1251/2018).

Author information

Authors and Affiliations

Moscow Institute of Physics and Technology, 141701, Dolgoprudny, Moscow oblast, Russia
A. V. Grabovoy
Moscow Institute of Physics and Technology, Dorodnicyn Computing Centre, Federal Research Center “Computer Science and Control” of the Russian Academy of Sciences, 141701, Dolgoprudny, Moscow oblast, Russia
V. V. Strijov

Authors

A. V. Grabovoy
View author publications
You can also search for this author in PubMed Google Scholar
V. V. Strijov
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to A. V. Grabovoy or V. V. Strijov.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Grabovoy, A.V., Strijov, V.V. Prior Distribution Selection for a Mixture of Experts. Comput. Math. and Math. Phys. 61, 1140–1152 (2021). https://doi.org/10.1134/S0965542521070071

Download citation

Received: 26 November 2020
Revised: 26 November 2020
Accepted: 11 March 2021
Published: 22 August 2021
Issue Date: July 2021
DOI: https://doi.org/10.1134/S0965542521070071

Keywords:

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions