Abstract
Graphs provide powerful abstractions of relational data, and are widely used in fields such as network management, web page analysis and sociology. While many graph representations of data describe dynamic and time evolving relationships, most graph mining work treats graphs as static entities. Our focus in this paper is to discover regions of a graph that are evolving in a similar manner. To discover regions of correlated spatio-temporal change in graphs, we propose an algorithm called cSTAG. Whereas most clustering techniques are designed to find clusters that optimise a single distance measure, cSTAG addresses the problem of finding clusters that optimise both temporal and spatial distance measures simultaneously. We show the effectiveness of cSTAG using a quantitative analysis of accuracy on synthetic data sets, as well as demonstrating its utility on two large, real-life data sets, where one is the routing topology of the Internet, and the other is the dynamic graph of files accessed together on the 1998 World Cup official website.
Similar content being viewed by others
References
Aggarwal CC, Han J, Wang J, Yu PS (2003) A framework for clustering evolving data streams. In: Proceedings of the 29th international conference on very large data bases, pp 81–92
Ahuja R, Magnanti T and Orlin J (1993). Network flows: theory, algorithms, and applications. Prentice Hall, Englewood clitts
Ali MH, Mokbel MF, Aref WG, Kamel I (2005) Detection and tracking of discrete phenomena in sensor-network databases. In: Proceedings of the 17th international conference on scientific and statistical database management, pp 163–172
An Y, Janssen J and Milios EE (2004). Characterizing and mining the citation graph of the computer science literature. Knowl Inf Sys 6: 664–678
Arlitt M, Jin T (1999) Workload characterization of the 1998 World Cup website. Technical report HPL-99-35R1, Hewlett-Packard Labs
Bar-Yossef Z, Guy I, Lempel R, Maarek YS, Soroka V (2007) Cluster ranking with an application to mining mailbox networks. Knowl Inf Sys,
Barabasi AL and Albert R (1999). Emergence of scaling in random networks. Science 286: 500–512
Borgwardt KM, Kriegel HP, Wackersreuther P (2006) Pattern mining in frequent dynamic subgraphs. In: Proceedings of the 6th international conference on data mining, pp 818–822
Celik M, Shekhar S, Rogers JP, Shine JA, Yoo JS (2006) Mixed-drove spatio-temporal co-occurance pattern mining: A summary of results. In: Proceedings of the 6th international conference on data mining, pp 119–128
Chen C (2005) The centrality of pivotal points in the evolution of scientific networks. In: Proceedings of the 10th international conference on intelligent user interfaces, pp 98–105
Chan, J, Bailey J, Leckie C (2006) Discovering and summarising regions of correlated spatio-temporal change in evolving graphs. In: First workshop on spatial and spatio-temporal data mining, pp 361–365
Cheng Y, Church GM (2000) Biclustering of expression data. In: Proceedings of the 8th international conference on intelligent systems for molecular biology, pp 93–103
Cook D, Holder L (1994) Substructure discovery using minimum description length and background knowledge. In: AAAI-94: The 12th national conference on artificial intelligence. vol 2, p 1442
Cook WJ, Cunningham WH, Pulleyblank WR and Schrijver A (1998). Combinatorial Optimization. Wiley-Interscience, New York
Cormen TH, Leiserson CE, Rivest RL and Stein C (2001). Introduction to algorithms. MIT Press, Cambridge
Cowie J, Popescu A, Underwood T (2005) Impact of Hurricane Katrina on Internet infrastructure. Technical report, Renesys Corporation. http://www.renesys.com/resource_library/Renesys-Katrina-Report-9sep2005.pdf
Demetrescu C and Italiano GF (2004). A new approach to dynamic all pairs shortest paths. J ACM 51(6): 968–992
Demetrescu C and Italiano GF (2006). Experimental analysis of dynamic all pairs shortest path algorithms. ACM Trans Algorithms 2(4): 578–601
Desikan P, Pathak N, Srivastava J, Kumar V (2005) Incremental pagerank computation on evolving graphs. In: Proceedings of 14th international conference on World Wide Web, pp 1094–1095
Desikan P, Srivastava J (2004a) Analyzing network traffic to detect e-mail spamming machines. In: ICDM workshop on privacy and security aspects of data mining
Desikan P, Srivastava J (2004b) Mining temporally evolving graphs. In: KDD workshop on web mining and web usage analysis. Seattle
Dhillon IS (2001) Co-clustering documents and words using bipartite spectral graph partitioning. In: Proceedings of the 7th ACM SIGKDD international conference on knowledge discovery and data mining, pp 269–274
Duda RO, Hart PE and Stork DG (2000). Pattern classification. Wiley-Interscience, New York
Feamster N, Balakrishnan H, Rexford J (2004) Some foundational problems in interdomain routing. In: 3rd ACM SIGCOMM workshop on hot topics in networking (HotNets)
Frigioni D, Marchetti-Spaccamela A, Nanni U (1996) Fully dynamic output bounded single source shortest path problem. In: Proceedings of the 7th annual ACM-SIAM symposium on discrete algorithms, pp 212–221
Gaertler M, Patrignani M (2004) Dynamic analysis of the autonomous system graph. In: Second international workshop on inter-domain performance and simulation, pp 13–24
Girvan M, Newman ME (2002) Community structure in social and biological networks. In: Proceedings of the national academy of science. vol 99, pp 7821–7826
Halabi S and McPherson D. (2001). Internet routing architectures, 2nd edn. Cisco Press, USA
Halkidi M, Batisakis Y and Vazirgiannis M (2001). On clustering validation techniques. J Intelligent Inf Sys 17(2–3): 107–145
Hoebe CJ, Spanjaard L, Dankert J, Nlkerke N and Melker H (2004). Space-time cluster analysis of invasive meningococcal disease. Emerg Infect Dis 10(9): 1621–1626
Jain AK and Dubes RC (1998). Algorithms for Clustering Data. Prentice-Hall, Englewood Clitts
Kaindl H and Kainz G (1997). Bidirectional heuristic search reconsidered. J Artif Intell Res 7: 283–317
Kandula S, Katabi D, Vasseur J-P (2005) Shrink: A tool for failure diagnosis in IP networks. In: ACM SIGCOMM workshop on mining network data (MineNet-05), pp 173–178
Kawaji H, Yamaguchi Y, Matsuda H and Hashimoto A (2001). A graph-based clustering method for a large set of sequences using a graph partitioning algorithm. Genome Inf 12: 93–102
Keogh E, Pazzani M (2001) Derivative dynamic time warping. In: Proceedings of 1st SIAM international conference on data mining
King V (1999) Fully dynamic algorithms for maintaining all-pairs shorest path and transitive closure in digraphs. In: Proceedings of the 40th IEEE symposium on foundations of computer science, pp 81–99
Kleinberg JM (1998) Authoritative sources in a hyperlinked environment. In: Proceedings of the ACM-SIAM symposium on discrete algorithms, pp 668–677
Kleinberg JM, Kumar R, Raghavan P, Rajagopalan S, Tomkins AS (1999) The Web as a graph: Measurements, models and methods. Lecture notes in computer science vol 1627, pp 1–17
Kumar R, Novak J, Raghavan P, Tomkins AS (2003) On the bursty evolution of blogspace. In: Proceedings of the 12th international conference on World Wide Web, pp 568–576
Kumar R, Novak J, Tomkins AS (2006) Structure and evolution of online social networks. In: Proceedings of the 12th ACM SIGKDD conference on knowledge discovery and data mining (poster)
Lauw HW, Lim E-P, Tan T-T, Pang H-H (2005) Mining social networks from spatio-temporal events. In: Workshop on link analysis, couterterrorism and Security
Lee GJ, Poole L (2006) Diagnosis of TCP overlay connection failures using bayesian networks. In: ACM SIGCOMM Workshop on Mining Network Data (MineNet-06), pp 305–310
Leskovec J, Kleinberg J, Faloutsos C (2005) Graphs over time: densification laws, shrinking diameters and possible explanations. In: Proceedings of the 11th ACM SIGKDD international conference on knowledge discovery in data mining, pp 177–187
Neill DB, Moore AW, Sabhnani M, Daniel K (2005) Detection of emerging space-time clusters. In: Proceedings of the 11th ACM SIGKDD international conference on knowledge discovery and data mining, pp 218–227
Newman MEJ (2003). The structure and function of complex networks. SIAM Rev 45: 167–256
Ramalingam G and Reps T (1996). An incremental algorithm for a generalisation of the shortest-path problem. J Algorithms 21: 267–305
Rattigan MJ, Majer M, Jensen D (2006) Using structure indices for efficient approximation of network properties. In: Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining, pp 357–366
Salvador S, Chan P (2004) Fastdtw: toward accurate dynamic time warping in linear time and space. In: KDD workshop on mining temporal and sequential data
Shoubridge PJ, Kraetzl M, Wallis WD and Bunke H (2002). Detection of abnormal change in a time series of graphs. J Interconnect Netw 3(1–2): 85–101
Steinder M, Sethi AS (2001) The present and future of event correlation: A need for end-to-end service fault localization. In: Proceedings of world multi-conference on systemics, cybernetics, and informatics, pp 124–129
Steinder M and Sethi AS (2004). Probabilistic fault localization in communication systems using belief networks. IEEE/ACM Trans Netwo 12(5): 809–822
Tang Y, Al-Shaer E, Boutaba R (2005) Active integrated fault localization in communication networks. In: Proceedings of 9th IFIP/IEEE international symposium on integrated network management, 2005, pp 543–556
Ting R, Bailey J (2006) Mining minimal contrast subgraph patterns. In: Proceedings of SIAM international conference on data mining, pp 639–643
Tung AKH, Ng RT, Lakshmanan LVS, Han J (2001) Constraint-based clustering in large databases. In: Proceedings of the 8th international conference on database theory, pp 405–419
Vlachos M, Kollios G, Gunopulos D (2002) Discovering similar multidimensional trajectories. In: Proceedings of the 18th international conference on data engineering p 673
Wagstaff K, Cardie C (2000) Clustering with instance-level constraints. In: Proceedings of the 17th international conference on machine learning, pp 1103–1110
Washio T and Motoda H (2003). State of the art of graph-based data mining. ACM SIGKDD Explor News 5(1): 59–68
Wu AY, Garland M, Han J (2004) Mining scale-free networks using geodesic clustering. In: Proceedings of the 10th ACM SIGKDD international conference on knowledge discovery and data mining, pp 719–724
Zhao Q, Liu T-Y, Bhowmick SS, ng Ma W-Y (2006) Event detection from evolution of click-through data. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining, pp 484–493
Zhou A, Cao F, Qian W, Jin C (2007) Tracking clusters in evolving data streams over sliding windows. Knowl Inf Sys
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Chan, J., Bailey, J. & Leckie, C. Discovering correlated spatio-temporal changes in evolving graphs. Knowl Inf Syst 16, 53–96 (2008). https://doi.org/10.1007/s10115-007-0117-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-007-0117-z