Natural alpha embeddings

Volpi, Riccardo; Malagò, Luigi

doi:10.1007/s41884-021-00043-9

Natural alpha embeddings

Research Paper
Published: 18 April 2021

Volume 4, pages 3–29, (2021)
Cite this article

Information Geometry Aims and scope Submit manuscript

483 Accesses
1 Citation
5 Altmetric
Explore all metrics

A Publisher Correction to this article was published on 15 May 2021

This article has been updated

Abstract

Learning an embedding for a large collection of items is a popular approach to overcome the computational limitations associated to one-hot encodings. The aim of item embeddings is to learn a low dimensional space for the representations, able to capture with its geometry relevant features or relationships for the data at hand. This can be achieved for example by exploiting adjacencies among items in large sets of unlabelled data. In this paper we interpret in an Information Geometric framework the item embeddings obtained from conditional models. By exploiting the \(\alpha \)-geometry of the exponential family, first introduced by Amari, we introduce a family of natural \(\alpha \)-embeddings represented by vectors in the tangent space of the probability simplex, which includes as a special case standard approaches available in the literature. A typical example is given by word embeddings, commonly used in natural language processing, such as Word2Vec and GloVe. In our analysis, we show how the \(\alpha \)-deformation parameter can impact on standard evaluation tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Compressing Word Embeddings

A Fistful of Vectors: A Tool for Intrinsic Evaluation of Word Embeddings

Article 22 January 2024

Word Embedding for Understanding Natural Language: A Survey

Change history

15 May 2021
A Correction to this paper has been published: https://doi.org/10.1007/s41884-021-00045-7

Notes

In the following for each word w we suppose the \(\text {arg max}\) to be unique. When this is not the case the formula can be easily generalized.

References

Amari, S.I.: Theory of information spaces: a differential geometrical foundation of statistics. Post RAAG Reports (1980)
Amari, S.I.: Differential geometry of curved exponential families-curvatures and information loss. Ann. Stat. 10, 357–385 (1982)
Article MathSciNet Google Scholar
Amari, S.I.: Geometrical theory of asymptotic ancillarity and conditional inference. Biometrika 69(1), 1–17 (1982)
Article MathSciNet Google Scholar
Amari, S.I.: Differential-Geometrical Methods in Statistics. Lecture Notes in Statistics, vol. 28. Springer, New York (1985)
MATH Google Scholar
Amari, S.I.: Dual connections on the Hilbert bundles of statistical models. In: Geometrization of Statistical Theory, pp. 123–151. ULDM Publ., Lancaster (1987)
Amari, S.I.: Information Geometry and Its Applications, Applied Mathematical Sciences, vol. 194. Springer, Tokyo (2016)
Book Google Scholar
Amari, S.I., Cichocki, A.: Information geometry of divergence functions. Bull. Polish Acad. Sci. Tech. Sci. 58(1), 183–195 (2010)
Google Scholar
Amari, S.I., Nagaoka, H.: Methods of Information Geometry. American Mathematical Society, Providence (2000)
MATH Google Scholar
Arora, S., Li, Y., Liang, Y., Ma, T., Risteski, A.: Linear Algebraic Structure of Word Senses, with Applications to Polysemy. arXiv:1601.03764 (2016)
Arora, S., Li, Y., Liang, Y., Ma, T., Risteski, A.: Rand-walk: a latent variable model approach to word embeddings. arXiv:1502.03520 (2016)
Bakarov, A.: A Survey of Word Embeddings Evaluation Methods. arXiv:1801.09536 (2018)
Barkan, O., Koenigstein, N.: ITEM2vec: neural item embedding for collaborative filtering. In: IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP), pp. 1–6 (2016)
Baroni, M., Dinu, G., Kruszewski, G.: Don’t count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors. In: 52nd Annual Meeting of the Association for Computational Linguistics, pp. 238–247 (2014)
Baroni, M., Lenci, A.: How we blessed distributional semantic evaluation. In: Workshop on Geometrical Models of Natural Language Semantics, pp. 1–10 (2011)
Bengio, Y., Ducharme, R., Vincent, P., Jauvin, C.: A neural probabilistic language model. J. Mach. Learn. Res. 3, 1137–1155 (2003)
MATH Google Scholar
Bengio, Y., Simard, P., Frasconi, P., et al.: Learning long-term dependencies with gradient descent is difficult. IEEE Trans. Neural Netw. 5(2), 157–166 (1994)
Article Google Scholar
Bullinaria, J.A., Levy, J.P.: Extracting semantic representations from word co-occurrence statistics: a computational study. Behav. Res. Methods 39(3), 510–526 (2007)
Article Google Scholar
Bullinaria, J.A., Levy, J.P.: Extracting semantic representations from word co-occurrence statistics: stop-lists, stemming, and SVD. Behav. Res. Methods 44(3), 890–907 (2012)
Article Google Scholar
Casella, G., Berger, R.L.: Statistical Inference, 2nd edn. Duxbury Press, California (2001)
MATH Google Scholar
Coenen, A., Reif, E., Yuan, A., Kim, B., Pearce, A., Vigas, F., Wattenberg, M.: Visualizing and Measuring the Geometry of BERT. NeurIPS (2019)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. Human Language Technologies, North American Chapter of the Association for Computational Linguistics (2019)
Firth, J.R.: A Synopsis of Linguistic Theory (1957)
Fonarev, A., Grinchuk, O., Gusev, G., Serdyukov, P., Oseledets, I.: Riemannian optimization for skip-gram negative sampling. In: Proceedings of the Association for Computational Linguistics, pp. 2028–2036 (2017)
Guy, L.: Riemannian geometry and statistical machine learning. Ph.D. Thesis, Carnegie Mellon University (2005)
Hewitt, J., Manning, C.: A structural probe for finding syntax in word representations. In: North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 4129–4138 (2019)
Ichimori, T.: On rounding off quotas to the nearest integers in the problem of apportionment. JSIAM Lett. 3, 21–24 (2011)
Article MathSciNet Google Scholar
Jawanpuria, P., Balgovind, A., Kunchukuttan, A., Mishra, B.: Learning multilingual word embeddings in latent metric space: a geometric approach. Trans. Assoc. Comput. Linguist. 7, 107–120 (2019)
Article Google Scholar
Koren, Y., Bell, R., Volinsky, C.: Matrix factorization techniques for recommender systems. Comput. IEEE 42(8), 30–37 (2009)
Article Google Scholar
Krishnamurthy, B., Puri, N., Goel, R.: Learning vector-space representations of items for recommendations using word embedding models. Procedia Comput. Sci. 80, 2205–2210 (2016)
Article Google Scholar
Lauritzen, S.L.: Statistical manifolds. Differential geometry in statistical inference, pp. 163–216 (1987)
Lebanon, G.: Metric learning for text documents. IEEE Trans. Pattern Anal. Mach. Intell. 28(4), 497–508 (2006)
Article Google Scholar
Lee, L.S.Y.: On the linear algebraic structure of distributed word representations. arXiv:1511.06961 (2015)
Levy, O., Goldberg, Y.: Neural word embedding as implicit matrix factorization. NIPS p. 9 (2014)
Levy, O., Goldberg, Y., Dagan, I.: Improving distributional similarity with lessons learned from word embeddings. Trans. Assoc. Comput. Linguist. 3, 211–225 (2015)
Article Google Scholar
Meng, Y., Huang, J., Wang, G., Zhang, C., Zhuang, H., Kaplan, L., Han, J.: Spherical text embedding. Advances in Neural Information Processing Systems (2019)
Michel, P., Ravichander, A., Rijhwani, S.: Does the geometry of word embeddings help document classification? A case study on persistent homology-based representations. In: Proceedings of the 2nd Workshop on Representation Learning for NLP (2017)
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: International Conference on Learning Representations (2013)
Mikolov, T., Karafit, M., Burget, L., Cernock, J., Khudanpur, S.: Recurrent neural network based language model. In: Annual Conference of the International Speech Communication Association (2010)
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. Advances in Neural Information Processing Systems (2013)
Mikolov, T., Yih, W.T., Zweig, G.: Linguistic regularities in continuous space word representations. NAACL-HLT (2013)
Mu, J., Bhat, S., Viswanath, P.: All-but-the-top: simple and effective postprocessing for word representations. ICLR (2018)
Nagaoka, H., Amari, S.I.: Differential geometry of smooth families of probability distributions. Tech. rep., Technical Report METR 82-7, Univ. Tokyo (1982)
Nickel, M., Kiela, D.: Poincaré embeddings for learning hierarchical representations. Advances in Neural Information Processing Systems (2017)
Pennington, J., Socher, R., Manning, C.: Glove: Global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) (2014)
Radford, A., Narasimhan, K., Salimans, T., Sutskever, I.: Improving Language Understanding by Generative Pre-Training. Computer Science (2018)
Raunak, V.: Simple and Effective Dimensionality Reduction for Word Embeddings. LLD Workshop NIPS (2017)
Rudolph, M., Ruiz, F., Mandt, S., Blei, D.: Exponential family embeddings. Advances in Neural Information Processing Systems (2016)
Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning representations by back-propagating errors. Nature 323(6088), 533–536 (1986)
Article Google Scholar
Sugawara, K., Kobayashi, H., Iwasaki, M.: On approximately searching for similar word embeddings. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (2016)
Tifrea, A., Bécigneul, G., Ganea, O.E.: Poincaré glove: hyperbolic word embeddings. In: International Conference on Learning Representations (2019)
Volpi, R., Malagò, L.: Evaluating natural alpha embeddings on intrinsic and extrinsic tasks. In: Proceedings of the 5th Workshop on Representation Learning for NLP (2020)
Volpi, R., Thakur, U., Malagò, L.: Changing the geometry of representations: \(\alpha \)-embeddings for nlp tasks (submitted) (2020)
Wada, J.: A divisor apportionment method based on the Kolm–Atkinson social welfare function and generalized entropy. Math. Soc. Sci. 63(3), 243–247 (2012)
Article MathSciNet Google Scholar
Wikiextractor. https://github.com/attardi/wikiextractor. Accessed 2017-10
Wu, L., Fisch, A., Chopra, S., Adams, K., Bordes, A., Weston, J.: StarSpace: embed all the things! arXiv:1709.03856 (2018)
Yang, Z., Dai, Z., Yang, Y., Carbonell, J., Salakhutdinov, R., Le, Q.V.: XLNet: generalized autoregressive pretraining for language understanding. In: Conference on Neural Information Processing Systems (2019)
Zhao, X., Louca, R., Hu, D., Hong, L.: Learning item-interaction embeddings for user recommendations. arXiv:1812.04407 (2018)

Download references

Acknowledgements

The authors are supported by the DeepRiemann project, co-funded by the European Regional Development Fund and the Romanian Government through the Competitiveness Operational Programme 2014–2020, Action 1.1.4, project ID P_37_714, Contract No. 136/27.09.2016.

Author information

Authors and Affiliations

Transylvanian Institute of Neuroscience (TINS), Cluj-Napoca, Romania
Riccardo Volpi & Luigi Malagò
Romanian Institute of Science and Technology (RIST), Cluj-Napoca, Romania
Riccardo Volpi & Luigi Malagò

Authors

Riccardo Volpi
View author publications
You can also search for this author in PubMed Google Scholar
Luigi Malagò
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Riccardo Volpi.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary material 1 (mp4 69 KB)

Supplementary material 2 (mp4 211 KB)

Appendix A: GloVe training

During the training of GloVe we monitor performances in terms of accuracy on the word analogies task, in comparison with the literature, see Table 5.

Table 5 Accuracy on the word analogy tasks of [37, 39, 44] for different embeddings size and at different iterations during the training, compared with literature [44]

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Volpi, R., Malagò, L. Natural alpha embeddings. Info. Geo. 4, 3–29 (2021). https://doi.org/10.1007/s41884-021-00043-9

Download citation

Received: 30 December 2019
Revised: 23 November 2020
Accepted: 02 February 2021
Published: 18 April 2021
Issue Date: July 2021
DOI: https://doi.org/10.1007/s41884-021-00043-9

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Natural alpha embeddings

Abstract

Access this article

Similar content being viewed by others

Compressing Word Embeddings

A Fistful of Vectors: A Tool for Intrinsic Evaluation of Word Embeddings

Word Embedding for Understanding Natural Language: A Survey

Change history

15 May 2021

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Supplementary Information

Appendix A: GloVe training

Rights and permissions

About this article

Cite this article

Navigation

Natural alpha embeddings

Abstract

Access this article

Similar content being viewed by others

Compressing Word Embeddings

A Fistful of Vectors: A Tool for Intrinsic Evaluation of Word Embeddings

Word Embedding for Understanding Natural Language: A Survey

Change history

15 May 2021

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Supplementary Information

Appendix A: GloVe training

Appendix A: GloVe training

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation