Skip to main content

DeepBrowse: Similarity-Based Browsing Through Large Lists (Extended Abstract)

  • Conference paper
  • First Online:
Similarity Search and Applications (SISAP 2017)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10609))

Included in the following conference series:

Abstract

We propose a new approach for browsing through large lists in the absence of a predefined hierarchy. DeepBrowse is defined by the interaction of two fixed, globally-defined permutations on the space of objects: one ordering the items by similarity, the second based on magnitude or importance. We demonstrate this paradigm through our WikiBrowse app for discovering interesting Wikipedia pages, which enables the user to scan similar related entities and then increase depth once a region of interest has been found.

Constructing good similarity orders of large collections of complex objects is a challenging task. Graph embeddings are assignments of vertices to points in space that reflect the structure of any underlying similarity or relatedness network. We propose the use of graph embeddings (DeepWalk) to provide the features to order items by similarity.

The problem of ordering items in a list by similarity is naturally modeled by the Traveling Salesman Problem (TSP), which seeks the minimum-cost tour visiting the complete set of items. We introduce a new variant of TSP designed to more effectively order vertices so as to reflect longer-range similarity. We present interesting combinatorial and algorithmic properties of this formulation, and demonstrate that it works effectively to organize large product universes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 54.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 69.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Al-Rfou, R., Perozzi, B., Skiena, S.: Polyglot: distributed word representations for multilingual NLP. In: CoNLL 2013, p. 183 (2013)

    Google Scholar 

  2. André, P., Teevan, J., Dumais, S.T.: From x-rays to silly putty via Uranus: serendipity and its role in web search. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 2033–2036. ACM (2009)

    Google Scholar 

  3. André, P., Teevan, J., Dumais, S.T., et al.: Discovery is never by chance: designing for (un) serendipity. In: Proceedings of the Seventh ACM Conference on Creativity and Cognition, pp. 305–314. ACM (2009)

    Google Scholar 

  4. Arkin, E.M., Chiang, Y.J., Mitchell, J.S.B., Skiena, S.S., Yang, T.: On the maximum scatter TSP. SIAM J. Comput. 29(2), 515–544 (2000)

    Article  MATH  Google Scholar 

  5. Arthur, D., Vassilvitskii, S.: k-means++: the advantages of careful seeding. In: Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 1027–1035. Society for Industrial and Applied Mathematics (2007)

    Google Scholar 

  6. Belkin, M., Niyogi, P.: Laplacian Eigenmaps and spectral techniques for embedding and clustering. In: NIPS, vol. 14, pp. 585–591 (2001)

    Google Scholar 

  7. Blum, A., Chalasani, P., Coppersmith, D., Pulleyblank, B., Raghavan, P., Sudan, M.: The minimum latency problem. In: Proceedings of the Twenty-sixth Annual ACM Symposium on Theory of Computing, pp. 163–171. ACM (1994)

    Google Scholar 

  8. Bordino, I., Mejova, Y., Lalmas, M.: Penguins in sweaters, or serendipitous entity search on user-generated content. In: Proceedings of the 22nd ACM International Conference on Information and Knowledge Management, pp. 109–118. ACM (2013)

    Google Scholar 

  9. Chen, Y., Perozzi, B., Skiena, S.: Vector-based similarity measurements for historical figures. In: Amato, G., Connor, R., Falchi, F., Gennaro, C. (eds.) SISAP 2015. LNCS, vol. 9371, pp. 179–190. Springer, Cham (2015). doi:10.1007/978-3-319-25087-8_17

    Chapter  Google Scholar 

  10. Clarke, C.L., Kolla, M., Cormack, G.V., Vechtomova, O., Ashkan, A., Büttcher, S., MacKinnon, I.: Novelty and diversity in information retrieval evaluation. In: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 659–666. ACM (2008)

    Google Scholar 

  11. Cox, T.F., Cox, M.A.: Multidimensional Scaling. CRC Press, Boca Raton (2000)

    MATH  Google Scholar 

  12. Croes, G.A.: A method for solving traveling-salesman problems. Oper. Res. 6(6), 791–812 (1958)

    Article  MathSciNet  Google Scholar 

  13. De Bruijn, O., Spence, R.: A new framework for theory-based interaction design applied to serendipitous information retrieval. ACM Trans. Comput. Hum. Interact. (TOCHI) 15(1), 5 (2008)

    Article  Google Scholar 

  14. Hauff, C., Houben, G.J.: Serendipitous browsing: stumbling through wikipedia. In: Searching4Fun! Workshop (2012)

    Google Scholar 

  15. Hoffman, K.L., Padberg, M., Rinaldi, G.: Traveling salesman problem. In: Encyclopedia of Operations Research and Management Science, pp. 1573–1578. Springer (2013)

    Google Scholar 

  16. Lin, S., Kernighan, B.W.: An effective heuristic algorithm for the traveling-salesman problem. Oper. Res. 21(2), 498–516 (1973)

    Article  MathSciNet  MATH  Google Scholar 

  17. Liu, H., Xie, X., Tang, X., Li, Z.W., Ma, W.Y.: Effective browsing of web image search results. In: Proceedings of the 6th ACM SIGMM International Workshop on Multimedia Information Retrieval, pp. 84–90. ACM (2004)

    Google Scholar 

  18. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)

    Google Scholar 

  19. Papadimitriou, C.H.: The Euclidean travelling salesman problem is NP-complete. Theoret. Comput. Sci. 4(3), 237–244 (1977)

    Article  MathSciNet  MATH  Google Scholar 

  20. Perozzi, B., Al-Rfou, R., Skiena, S.: DeepWalk: online learning of social representations. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 701–710. ACM (2014)

    Google Scholar 

  21. Rodden, K., Basalaj, W., Sinclair, D., Wood, K.: Evaluating a visualisation of image similarity as a tool for image browsing. In: IEEE Symposium on Information Visualization, pp. 36–43. IEEE (1999)

    Google Scholar 

  22. Rosenkrantz, D.J., Stearns, R.E., Lewis, P.M.: An analysis of several heuristics for the traveling salesman problem. SIAM J. Comput. 6(3), 563–581 (1977)

    Article  MathSciNet  MATH  Google Scholar 

  23. Skiena, S.S., Ward, C.B.: Who’s Bigger? Where Historical Figures Really Rank. Cambridge University Press, Cambridge (2013)

    Google Scholar 

  24. Tenenbaum, J.B., De Silva, V., Langford, J.C.: A global geometric framework for nonlinear dimensionality reduction. Science 290(5500), 2319–2323 (2000)

    Article  Google Scholar 

  25. Toms, E.G.: Serendipitous information retrieval. In: DELOS Workshop: Information Seeking, Searching and Querying in Digital Libraries, Zurich (2000)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Steven Skiena .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Chen, H., Anantharam, A.R., Skiena, S. (2017). DeepBrowse: Similarity-Based Browsing Through Large Lists (Extended Abstract). In: Beecks, C., Borutta, F., Kröger, P., Seidl, T. (eds) Similarity Search and Applications. SISAP 2017. Lecture Notes in Computer Science(), vol 10609. Springer, Cham. https://doi.org/10.1007/978-3-319-68474-1_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-68474-1_21

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-68473-4

  • Online ISBN: 978-3-319-68474-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics