Skip to main content

Data Mining When Each Data Point is a Network

  • Conference paper
  • First Online:
Patterns of Dynamics (PaDy 2016)

Part of the book series: Springer Proceedings in Mathematics & Statistics ((PROMS,volume 205))

Included in the following conference series:

Abstract

We discuss the problem of extending data mining approaches to cases in which data points arise in the form of individual graphs. Being able to find the intrinsic low-dimensionality in ensembles of graphs can be useful in a variety of modeling contexts, especially when coarse-graining the detailed graph information is of interest. One of the main challenges in mining graph data is the definition of a suitable pairwise similarity metric in the space of graphs. We explore two practical solutions to solving this problem: one based on finding subgraph densities, and one using spectral information. The approach is illustrated on three test data sets (ensembles of graphs); two of these are obtained from standard literature graph generating algorithms, while the graphs in the third example are sampled as dynamic snapshots from an evolving network simulation. We further combine these approaches with equation free techniques, demonstrating how such data mining can enhance scientific computation of network evolution dynamics.

To Bernold Fiedler, with admiration for his choice of research problems in mathematics and modeling, and for what he has taught us about them.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Notes

  1. 1.

    Note that an alternative equivalent way to define the similarity measure would be to directly compare the contribution of the different eigenvectors to \(S_i\) instead of summing the contributions and then using different values of \(\lambda \). However, it is difficult to generalize this approach to cases where there are graphs of varying sizes.

References

  1. Barabási, A.L.: Linked: The New Science of Networks. Perseus Books Group (2002)

    Google Scholar 

  2. Bayati, M., Gleich, D.F., Saberi, A., Wang, Y.: Message Passing Algorithms for Sparse Network Alignment. ArXiv e-prints (2009)

    Google Scholar 

  3. Bold, K.A., Rajendran, K., Ráth, B., Kevrekidis, I.G.: An equation-free approach to coarse-graining the dynamics of networks. J. Comput. Dyn. 1(1) (2014)

    Google Scholar 

  4. Bunke, H.: A graph distance metric based on the maximal common subgraph. Pattern Recognition Letters 19(3–4), 255–259 (1998). http://dx.doi.org/10.1016/s0167-8655(97)00179-7

  5. Chung, F., Lu, L.: Connected components in random graphs with given expected degree sequences. Ann. Comb. 6, 125–145 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  6. Dsilva, C.J., Talmon, R., Coifman, R.R., Kevrekidis, I.G.: Parsimonious representation of nonlinear dynamical systems through manifold learning: a chemotaxis case study. Appl. Comput. Harmonic Anal. (2015)

    Google Scholar 

  7. Durrett, R., Gleeson, J.P., Lloyd, A.L., Mucha, P.J., Shi, F., Sivakoff, D., Socolar, J.E.S., Varghese, C.: Graph fission in an evolving voter model. PNAS 109, 3682–3687 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  8. Erdös, P., Rényi, A.: On random graphs, i. Publicationes Mathematicae (Debrecen) 6, 290–297 (1959)

    Google Scholar 

  9. Eubank, S.H., Guclu, V.S.A., Kumar, M., Marathe, M., Srinivasan, A., Toroczkai, Z., Wang, N.: Modelling disease outbreaks in realistic urban social networks. Nature 429, 180–184 (2004)

    Article  Google Scholar 

  10. Ferguson, N.M., Cummings, D.A.T., Cauchemez, S., Fraser, C., Riley, S., Meeyai, A., Iamsirithaworn, S., Burke, D.S.: Strategies for containing an emerging influenza pandemic in southeast asia. Nature 437, 209–214 (2005)

    Article  Google Scholar 

  11. Fowlkes, C., Belongie, S., Chung, F., Malik, J.: Spectral grouping using the Nystrom method. IEEE Trans. Pattern Anal. Mac. Intell. 26(2), 214–225 (2004)

    Article  Google Scholar 

  12. Gear, C.W., Kevrekidis, I.G.: Projective methods for stiff differential equations: problems with gaps in their eigenvalue spectrum. SIAM J. Sci. Comput. 24(4), 1091–1106 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  13. Ghosh, R., Lerman, K., Surachawala, T., Voevodski, K., Teng, S.H.: Non-Conservative Diffusion and its Application to Social Network Analysis. ArXiv e-prints (2011)

    Google Scholar 

  14. Gounaris, C., Rajendran, K., Kevrekidis, I., Floudas, C.: Generation of networks with prescribed degree-dependent clustering. Opt. Lett. 5, 435–451 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  15. Gärtner, T., Flach, P., Wrobel, S.: On graph kernels: hardness results and efficient alternatives. In: Conference on Learning Theory, pp. 129–143 (2003)

    Google Scholar 

  16. Holiday, A., Kevrekidis, I.G.: Equation-free analysis of a dynamically evolving multigraph. Eur. Phys. J. Spec. Top. 225(6–7), 1281–1292 (2016)

    Article  Google Scholar 

  17. Iori, G.: A microsimulation of traders activity in the stock market: the role of heterogeneity, agents interactions and trade frictions. J. Econ. Behav. Organ. 49, 269285 (2002)

    Article  Google Scholar 

  18. Kashima, H., Tsuda, K., Inokuchi, A.: Marginalized kernels between labeled graphs. In: Proceedings of the Twentieth International Conference on Machine Learning, pp. 321–328. AAAI Press (2003)

    Google Scholar 

  19. Kevrekidis, I.G., Gear, C.W., Hummer, G.: Equation-free: the computer-aided analysis of complex multiscale systems. AIChE J. 50(7), 1346–1355 (2004)

    Article  Google Scholar 

  20. Kevrekidis, I.G., Gear, C.W., Hyman, J.M., Kevrekidis, P.G., Runborg, O., Theodoropoulos, C., et al.: Equation-free, coarse-grained multiscale computation: enabling mocroscopic simulators to perform system-level analysis. Commun. Math. Sci. 1(4), 715–762 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  21. Koutra, D., Parikh, A., Ramdas, A., Xiang, J.: Algorithms for graph similarity and subgraph matching. http://www.cs.cmu.edu/jingx/docs/DBreport.pdf (2011)

  22. Levine, H., Rappel, W.J., Cohen, I.: Self-organization in systems of selfpropelled particles. Phys. Rev. E 63, 017,101 1–4 (2001)

    Google Scholar 

  23. Liu, Y., Passino, K.: Stable social foraging swarms in a noisy environment. IEEE Trans. Autom. Contr. 49, 30–44 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  24. Longini, I.M., Fine, P.E., Thacker, S.B.: Predicting the global spread of new infectious agents. Am. J. Epidemiol. 123, 383–391 (1986)

    Article  Google Scholar 

  25. Lovász, L., Szegedy, B.: Limits of dense graph sequences. J. Comb. Theory Ser. B 96(6), 933–957 (2006). https://doi.org/10.1016/j.jctb.2006.05.002

  26. Mahe, P., Ueda, N., Akutsu, T., Perret, J.L., Vert, J.P.: Extensions of marginalized graph kernels. In: Proceedings of the Twenty-First International Conference on Machine Learning, pp. 552–559. ACM Press (2004)

    Google Scholar 

  27. Marschler, C., Sieber, J., Berkemer, R., Kawamoto, A., Starke, J.: Implicit methods for equation-free analysis: convergence results and analysis of emergent waves in microscopic traffic models. SIAM J. Appl. Dyn. Syst. 13(3), 1202–1238. SIAM (2014)

    Google Scholar 

  28. Melnik, S., Garcia-Molina, H., Rahm, E.: Similarity flooding: a versatile graph matching algorithm and its application to schema matching. In: 18th International Conference on Data Engineering (ICDE 2002). http://ilpubs.stanford.edu:8090/730/ (2002)

  29. Nadler, B., Lafon, S., Coifman, R.R., Kevrekidis, I.G.: Diffusion maps, spectral clustering and eigenfunctions of fokker-planck operators. In: Advances in Neural Information Processing Systems 18, pp. 955–962. MIT Press (2005)

    Google Scholar 

  30. Nadler, B., Lafon, S., Coifman, R.R., Kevrekidis, I.G.: Diffusion maps, spectral clustering and reaction coordinates of dynamical systems. Appl. Comput. Harmonic Anal. 21(1), 113–127 (2006). 10.1016/j.acha.2005.07.004

    Article  MathSciNet  MATH  Google Scholar 

  31. Newman, M.E.J.: The structure and function of complex networks. SIAM Rev. 45(2), 167–256 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  32. Papadimitriou, P., Dasdan, A., Garcia-Molina, H.: Web graph similarity for anomaly detection. Technical Report 2008-1, Stanford InfoLab (2008). http://ilpubs.stanford.edu:8090/836/

  33. Pelillo, M.: Replicator equations, maximal cliques, and graph isomorphism. Neural Comput. 11, 1933–1955 (1998)

    Article  Google Scholar 

  34. Rajendran, K., Kevrekidis, I.G.: Analysis of data in the form of graphs. arXiv preprint arXiv:1306.3524 (2013)

  35. Raymond, J.W., Gardiner, E.J., Willett, P.: Rascal: Calculation of graph similarity using maximum common edge subgraphs. Comput. J. 45, 631–644 (2002)

    Article  MATH  Google Scholar 

  36. Shlens, J.: A tutorial on principal component analysis: derivation, discussion and singular value decomposition. http://www.cs.princeton.edu/picasso/mats/PCA-Tutorial-Intuition_jp.pdf (2003)

  37. Tenenbaum, J.B., Silva, V.d., Langford, J.C.: A global geometric framework for nonlinear dimensionality reduction. Science 290(5500), 2319–2323 (2000). 10.1126/science.290.5500.2319

  38. Vishwanathan, S.V.N., Borgwardt, K.M., Risi Kondor, I., Schraudolph, N.N.: Graph kernels. J. Mach. Learn. Resear. 11, 1201–1242 (2010)

    Google Scholar 

  39. Wang, S., Zhang, C.: Microscopic model of financial markets based on belief propagation. Phys. A 354, 496504 (2005)

    Google Scholar 

  40. Wernicke, S., Rasche, F.: Fanmod: a tool for fast network motif detection. Bioinformatics 22(9), 1152–1153 (2006). 10.1093/bioinformatics/btl038. http://bioinformatics.oxfordjournals.org/content/22/9/1152.abstract

  41. Zager, L.A., Verghese, G.C.: Graph similarity scoring and matching. Appl. Math. Lett. 21(1), 86–94 (2008). 10.1016/j.aml.2007.01.006. http://www.sciencedirect.com/science/article/pii/S0893965907001012

  42. Zelinka, B.: On a certain distance between isomorphism classes of graphs. Asopis Pro Pstovn Matematiky 100(4), 371–373. http://eudml.org/doc/21256 (1975)

Download references

Acknowledgements

The work of IGK was partially supported by the US National Science Foundation, as well as by AFOSR (Dr. Darema) and DARPA contract HR0011-16-C-0016.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Karthikeyan Rajendran .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Rajendran, K., Kattis, A., Holiday, A., Kondor, R., Kevrekidis, I.G. (2017). Data Mining When Each Data Point is a Network. In: Gurevich, P., Hell, J., Sandstede, B., Scheel, A. (eds) Patterns of Dynamics. PaDy 2016. Springer Proceedings in Mathematics & Statistics, vol 205. Springer, Cham. https://doi.org/10.1007/978-3-319-64173-7_17

Download citation

Publish with us

Policies and ethics