Skip to main content

Patterns amongst Competing Task Frequencies: Super-Linearities, and the Almond-DG Model

  • Conference paper
Advances in Knowledge Discovery and Data Mining (PAKDD 2013)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7818))

Included in the following conference series:

Abstract

If Alice has double the friends of Bob, will she also have double the phone-calls (or wall-postings, or tweets)? Our first contribution is the discovery that the relative frequencies obey a power-law (sub-linear, or super-linear), for a wide variety of diverse settings: tasks in a phone-call network, like count of friends, count of phone-calls, total count of minutes; tasks in a twitter-like network, like count of tweets, count of followees etc. The second contribution is that we further provide a full, digitized 2-d distribution, which we call the Almond-DG model, thanks to the shape of its iso-surfaces. The Almond-DG model matches all our empirical observations: super-linear relationships among variables, and (provably) log-logistic marginals. We illustrate our observations on two large, real network datasets, spanning ~2.2M and ~3.1M individuals with 5 features each. We show how to use our observations to spot clusters and outliers, like, e.g., telemarketers in our phone-call network.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Akoglu, L., Vaz de Melo, P.O.S., Faloutsos, C.: Quantifying reciprocity in large weighted communication networks. In: Tan, P.-N., Chawla, S., Ho, C.K., Bailey, J. (eds.) PAKDD 2012, Part II. LNCS (LNAI), vol. 7302, pp. 85–96. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  2. Bi, Z., Faloutsos, C., Korn, F.: The “DGX” distribution for mining massive, skewed data. In: KDD (August. 2001)

    Google Scholar 

  3. Clauset, A., Shalizi, C.R., Newman, M.E.J.: Power-law distributions in empirical data. SIAM Rev. 51(4), 661–703 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  4. Vaz de Melo, P.O.S., Akoglu, L., Faloutsos, C., Loureiro, A.A.F.: Surprising patterns for the call duration distribution of mobile phone users. In: Balcázar, J.L., Bonchi, F., Gionis, A., Sebag, M. (eds.) ECML PKDD 2010, Part III. LNCS (LNAI), vol. 6323, pp. 354–369. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  5. Embrechts, P., Lindskog, F., McNeil, A.: Modelling dependence with copulas and applications to risk management. In: Handbook of Heavy Tailed Distributions in Finance, pp. 331–385 (2003)

    Google Scholar 

  6. Faloutsos, C., Gaede, V.: Analysis of the z-ordering method using the hausdorff fractal dimension. In: VLDB (September 1996)

    Google Scholar 

  7. Fang, K.-T., Xu, J.-L.: A class of multivariate distributions including the multivariate logistic. Journal of Mathematical Research and Exposition 9, 91–98 (1989)

    MathSciNet  MATH  Google Scholar 

  8. Johnson, N., Kotz, S., Balakrishnan, N.: Continuous Univariate Distributions, 2nd edn. Wiley (1995)

    Google Scholar 

  9. Karmakar, S., Simonovic, S.: Bivariate flood frequency analysis: Part 1. determination of marginals by parametric and nonparametric techniques. Journal of Flood Risk Management 1, 190–200 (2008)

    Article  Google Scholar 

  10. KDD-Cup. Tencent Weibo Dataset (2012), http://www.kddcup2012.org

  11. Leskovec, J., Kleinberg, J.M., Faloutsos, C.: Graphs over time: densification laws, shrinking diameters and possible explanations. In: KDD, pp. 177–187 (2005)

    Google Scholar 

  12. Malik, H.J., Abraham, B.: Multivariate logistic distributions. Annals of Statistics 1, 588–590 (1973)

    Article  MathSciNet  MATH  Google Scholar 

  13. McGlohon, M., Akoglu, L., Faloutsos, C.: Weighted graphs and disconnected components: patterns and a generator. In: KDD, pp. 524–532 (2008)

    Google Scholar 

  14. Pareto, V.: Oeuvres Completes. Droz, Geneva (1896)

    Google Scholar 

  15. Schroeder, M.: Fractals, Chaos, Power Laws: Minutes from an Infinite Paradise. W.H. Freeman and Company, New York (1991)

    MATH  Google Scholar 

  16. Seshadri, M., Machiraju, S., Sridharan, A., Bolot, J., Faloutsos, C., Leskovec, J.: Mobile call graphs: beyond power-law and lognormal distributions. In: KDD, pp. 596–604 (2008)

    Google Scholar 

  17. Sklar, A.: Fonctions de répartition à n dimensions et leurs marges. Publ. Inst. Statist. Univ. Paris 8, 229–231 (1959)

    MathSciNet  Google Scholar 

  18. Tsourakakis, C.E.: Fast counting of triangles in large real networks without counting: Algorithms and laws. In: ICDM, pp. 608–617 (2008)

    Google Scholar 

  19. Valdez, E.A.: Understanding relationships using copulas. North American Actuarial Journal 2, 1–25 (1998)

    MathSciNet  MATH  Google Scholar 

  20. Zipf, G.: Human Behavior and Principle of Least Effort: An Introduction to Human Ecology. Addison Wesley, Cambridge (1949)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Koutra, D., Koutras, V., Prakash, B.A., Faloutsos, C. (2013). Patterns amongst Competing Task Frequencies: Super-Linearities, and the Almond-DG Model. In: Pei, J., Tseng, V.S., Cao, L., Motoda, H., Xu, G. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2013. Lecture Notes in Computer Science(), vol 7818. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37453-1_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-37453-1_17

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-37452-4

  • Online ISBN: 978-3-642-37453-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics