Skip to main content

A LexDFS-Based Approach on Finding Compact Communities

  • Chapter
  • First Online:
From Social Data Mining and Analysis to Prediction and Community Detection

Abstract

This article presents an efficient hierarchical clustering algorithm based on a graph traversal algorithm called LexDFS. This traversal algorithm has the property of going through the clustered parts of the graph in a small number of iterations, making them recognisable. The time complexity of our method is in O(n × log(n)). It is simple to implement and a thorough study shows that it outputs clusterings that are closer to some ground-truths than its competitors. Experiments are also carried out to analyse the behaviour of the algorithm during execution on sample graphs. This article also features a quality function called compactness, which measures how efficient is the cluster for internal communications. We prove that this quality function features interesting theoretical properties.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 129.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://snap.stanford.edu/data/.

References

  1. Adamcsek B, Palla G, Farkas I, Derényi I, Vicsek T. CFinder: locating cliques and overlapping modules in biological networks. Bioinformatics. 2006;22(8):1021–23

    Article  Google Scholar 

  2. Aldecoa R, Marín I. Surprise maximization reveals the community structure of complex networks. Sci Rep 2013;3. http://www.nature.com/articles/srep01060?WT.ec_id=SREP-631-20130201 and http://www.nature.com/articles/srep02930

  3. Bagga A, Baldwin B. Entity-based cross-document coreferencing using the vector space model. In: Proceedings of the 17th international conference on computational linguistics, vol. 1. Stroudsburg: Association for Computational Linguistics; 1998. P. 79–85

    Google Scholar 

  4. Barabási AL, Albert R. Emergence of scaling in random networks. Science 1999; 286(5439):509–12

    Article  MathSciNet  MATH  Google Scholar 

  5. Blondel VD, Guillaume JL, Lambiotte R, Lefebvre E. Fast unfolding of communities in large networks. J Stat Mech: Theory Exp 2008; 2008(10): P10,008

    Google Scholar 

  6. Brandes U, Delling D, Gaertler M, Gorke R, Hoefer M, Nikoloski Z, et al. On modularity clustering. IEEE Trans Knowl Data Eng 2008;20(2):172–88

    Article  MATH  Google Scholar 

  7. Chakraborty T, Sikdar S, Ganguly N, Mukherjee A. Citation interactions among computer science fields: a quantitative route to the rise and fall of scientific research. Soc Netw Anal Min 2014;4(1):1–18

    Article  Google Scholar 

  8. Chakraborty T, Sikdar S, Tammana V, Ganguly N, Mukherjee A. Computer science fields as ground-truth communities: their impact, rise and fall. In: Proceedings of advances in social networks analysis and mining (ASONAM). New York: ACM, 2013. P. 426–33

    Google Scholar 

  9. Chakraborty T, Srinivasan S, Ganguly N, Mukherjee A, Bhowmick S. On the permanence of vertices in network communities. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining, KDD 2014. New York, NY: ACM; 2014. P. 1396–405

    Google Scholar 

  10. Clauset A, Newman M, Moore C. Finding community structure in very large networks. Phys Rev E 2004;70(6). http://journals.aps.org/pre/abstract/10.1103/PhysRevE.70.066111

  11. Corneil DG, Dalton B, Habib M. LDFS-based certifying algorithm for the minimum path cover problem on cocomparability graphs. SIAM J Comput 2013;42(3):792–807

    Article  MathSciNet  MATH  Google Scholar 

  12. Corneil DG, Krueger RM. A unified view of graph searching. SIAM J Discr Math 2008;22(4):1259–276

    Article  MathSciNet  MATH  Google Scholar 

  13. Creusefond J, Largillier T, Peyronnet S. Finding compact communities in large graphs. In: Proceedings of advances in social networks analysis and mining (ASONAM), 2015. ACM; 2015. P. 1457–464

    Google Scholar 

  14. Creusefond J, Largillier T, Peyronnet S. On the evaluation potential of quality functions in community detection for different contexts. In: Advances in network science. Springer; 2016. P. 111–125

    Google Scholar 

  15. Flake GW, Lawrence S, Giles CL. Efficient identification of Web communities. In: Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining. New York: ACM, 2000. P. 150–60

    Google Scholar 

  16. Fortunato S. Community detection in graphs. Phys Rep 2010;486(3–5):75–174

    Article  MathSciNet  Google Scholar 

  17. Fortunato S, Barthelemy M. Resolution limit in community detection. Proc Natl Acad Sci 2007;104(1):36–41

    Article  Google Scholar 

  18. Girvan M, Newman ME. Community structure in social and biological networks. Proc Natl Acad Sci 2002;99(12):7821–826

    Google Scholar 

  19. Hansen P, Jaumard B. Minimum sum of diameters clustering. J Class 1987;4(2):215–26

    Article  MathSciNet  MATH  Google Scholar 

  20. Hu Y. Efficient, high-quality force-directed graph drawing. Math J 2005;10(1):37–71

    MathSciNet  Google Scholar 

  21. Kannan R, Vempala S, Vetta A. On clusterings: good, bad and spectral. J ACM (JACM) 2004;51(3):497–515

    Article  MathSciNet  MATH  Google Scholar 

  22. Klimt B, Yang Y. Introducing the enron corpus. In: CEAS. 2004

    MATH  Google Scholar 

  23. Lancichinetti A, Fortunato S, Kertész J. Detecting the overlapping and hierarchical community structure in complex networks. New J Phys 2009;11(3):033015

    Article  Google Scholar 

  24. Leskovec J, Kleinberg J, Faloutsos C. Graph evolution: densification and shrinking diameters. ACM Trans Knowl Discov Data 2007;1(1):2

    Article  Google Scholar 

  25. Leskovec J, Lang KJ, Dasgupta A, Mahoney MW. Statistical properties of community structure in large social and information networks. In: Proceedings of the 17th international conference on World Wide Web. ACM; 2008. P. 695–704

    Google Scholar 

  26. Leskovec J, Lang KJ, Mahoney M. Empirical comparison of algorithms for network community detection. In: Proceedings of the 19th international conference on World wide web. ACM; 2010. P. 631–40

    Google Scholar 

  27. Leskovec J, Mcauley JJ. Learning to discover social circles in ego networks. In: Advances in neural information processing systems; 2012. P. 539–47

    Google Scholar 

  28. Mislove A, Marcon M, Gummadi KP, Druschel P, Bhattacharjee B. Measurement and analysis of online social networks. In: Proceedings of the 5th ACM/Usenix internet measurement conference (IMC 2007), San Diego, CA; 2007

    Google Scholar 

  29. Newman ME, Girvan M. Finding and evaluating community structure in networks. Phys Rev E 2004;69(2):026113

    Article  Google Scholar 

  30. Pons P, Latapy M. Computing communities in large networks using random walks. J Graph Algorithms Appl 2006;10(2):191–218

    Article  MathSciNet  MATH  Google Scholar 

  31. Radicchi F, Castellano C, Cecconi F, Loreto V, Parisi D. Defining and identifying communities in networks. Proc Natl Acad Sci USA 2004;101(9):2658–2663

    Article  Google Scholar 

  32. Raghavan U, Albert R, Kumara S. Near linear time algorithm to detect community structures in large-scale networks. Phys Rev E 2007;76(3). http://journals.aps.org/pre/abstract/10.1103/PhysRevE.76.036106

  33. Rosvall M, Bergstrom CT. Maps of random walks on complex networks reveal community structure. Proc Natl Acad Sci 2008;105(4):1118–123

    Article  Google Scholar 

  34. Seidman SB. Network structure and minimum degree. Soc Netw 1983;5(3):269–87

    Article  MathSciNet  Google Scholar 

  35. Šubelj L, Bajec M. Model of complex networks based on citation dynamics. In: Proceedings of the 22nd international conference on World Wide Web; 2013. P. 527–30

    Google Scholar 

  36. Tarjan RE. Efficiency of a good but not linear set union algorithm. J ACM (JACM) 1975;22(2):215–25

    Article  MathSciNet  MATH  Google Scholar 

  37. Traag VA, Krings G, Van Dooren P. Significant scales in community structure. Sci Rep 2013;3. http://www.nature.com/articles/srep01060?WT.ec_id=SREP-631-20130201 and http://www.nature.com/articles/srep02930

  38. van Dongen S. Graph clustering by flow simulation. Ph.D. thesis (2000)

    Google Scholar 

  39. Van Laarhoven T, Marchiori E.: Axioms for graph clustering quality functions. J Mach Learn Res 2014;15(1):193–215

    MathSciNet  MATH  Google Scholar 

  40. Watts DJ, Strogatz SH. Collective dynamics of ‘small-world’ networks. Nature 1998;393(6684):440–42

    Article  Google Scholar 

  41. Yang J, Leskovec J. Defining and evaluating network communities based on ground-truth. Knowl Inf Syst 2012;42(1):81–213

    Google Scholar 

Download references

Acknowledgements

The authors thank Loïck Lhote for his help with the proof of continuity.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jean Creusefond .

Editor information

Editors and Affiliations

1 Appendix: Proofs of Axioms Compliance

1 Appendix: Proofs of Axioms Compliance

For technical reasons, we also need to define compactness when applied on unconnected clusters. The quality of a cluster on which information cannot spread is the lowest in our model, therefore the quality of a disconnected cluster is defined as a zero value.

1.1 1.1 Specific Notations

Some of previous notations omitted the graph dependency for brevity (length, distance, diameter, compactness, etc.). When there is ambiguity, the graph will be specified as a subscript. A connected graph in this context is a graph G = (V, w) for which \(\forall (u,v) \in V \times V\), dist(u, v) is defined, or equivalently there exist a path π ∈ paths(u, v) between u and v such that \(\forall i \in [0: \vert \pi \vert - 1]\), w(a i , a i+1) > 0.

\(\mathcal{P}(V )\) is the powerset of a set V of nodes, that is the set of all possible clusters.

\(\mathcal{C}(V )\) is the set of possible partitions of a set V of nodes, that is \(\{\{c_{1} \in \mathcal{P}(V ),\ldots,c_{\vert c\vert }\mathcal{P}(V )\}\), ∪ c i  = V, ∩ c i  = ∅}.

1.2 1.2 Permutation Invariance

Definition 1.

A graph clustering quality function Q is permutation invariant if for all graphs G = (V, w) and all isomorphisms f: V → V ′, it is the case that Q G (C) = Q f(G)(f(C)) where f is extended to graphs and clusterings by f(C) = {{ f(i) | i ∈ c} | c ∈ C} and f((V, w)) = (V ′, (i, j) → w(f −1(i), f −1(j)))

Proposition 1.

Compactness is permutation invariant

Sketch of proof: Compactness only uses internal edges as an input, therefore it does not depend on representation.

Proof.

First, distances on weighted graphs as defined previously are permutation invariant:

$$\displaystyle\begin{array}{rcl} \text{dist}_{f(G)}(f(u),f(v))& =& \text{min}_{\pi \in \text{paths}_{ f(G)}(u,v)}(\text{len}_{f(G)}(\pi )) {}\\ & =& \text{min}_{\pi \in \text{paths}_{ f(G)}(u,v)}(\text{len}_{G}(f^{-1}(\pi ))) {}\\ & =& \text{dist}_{G}(u,v) {}\\ \end{array}$$

Compactness is permutation invariant:

$$\displaystyle\begin{array}{rcl} L_{f(G)}(f(C))& =& \sum _{f(c)\in f(C)}\sum _{(f(i),f(j))\in f(c)^{2}} \frac{w(f(i),f(j))} {\text{max}_{(f(u),f(v))\in f(C)^{2}}(\text{dist}_{f(G)}(f(u),f(v)))} {}\\ & =& \sum _{f(c)\in f(C)}\sum _{(f(i),f(j))\in f(c)^{2}} \frac{w(i,j)} {max_{(f(u),f(v))\in f(C)^{2}}(dist_{G}(u,v))} {}\\ L_{f(G)}(f(C))& =& L_{G}(C) {}\\ \end{array}$$

1.3 1.3 Scale Invariance

Definition 2.

A graph clustering quality function Q is scale invariant if for all graphs G = (V, E), all clusterings C 1, C 2 of G and all constants α > 0, Q G (C 1) ≤ Q G (C 2) if and only if Q α G (C 1) ≤ Q α G (C 2), where α G = (V, (i, j) → α w(i, j)) is a graph with edge weights scaled by a factor α.

Proposition 2.

Compactness is scale invariant

Sketch of proof: Multiplying the edges by a α implies that the diameter is multiplied by 1∕α. Since the numerator of any element of the sum of compactness is multiplied by α, the total score is multiplied by α 2, therefore the order of the clusterings rank is kept.

Proof.

$$\displaystyle\begin{array}{rcl} \text{len}_{\alpha G}(v_{1},\ldots,v_{k})& =& \sum _{i=1}^{k-1} \frac{1} {\alpha w(v_{i},v_{i+1})} {}\\ \text{len}_{\alpha G}(v_{1},\ldots,v_{k})& =& \frac{len_{G}(v_{1},\ldots,v_{k})} {\alpha } {}\\ \end{array}$$

Since the lengths of the paths are linearly correlated, the minimum paths are the same in G and in α G, therefore:

$$\displaystyle\begin{array}{rcl} \text{dist}_{\alpha G}(u,v)& =& \text{min}_{\pi \in \text{paths}_{\alpha G}(u,v)}\left (\text{len}_{\alpha G}(\pi )\right ) {}\\ & =& \text{min}_{\pi \in \text{paths}_{\alpha G}(u,v)}(\frac{\text{len}_{G}(\pi )} {\alpha } ) {}\\ & =& \frac{\text{dist}_{G}(u,v)} {\alpha } {}\\ \end{array}$$

Using the same reasoning, diam α G (c) = diam G (c)∕α the compactness can be written as:

$$\displaystyle\begin{array}{rcl} L_{\alpha G}(C)& =& \sum _{c\in C}\sum _{(i,j)\in c^{2}} \frac{w_{\alpha G}(i,j)} {\text{diam}_{\alpha G}(c)} {}\\ & =& \sum _{c\in C}\sum _{(i,j)\in c\times c}\alpha ^{2} \frac{w_{G}(i,j)} {\text{diam}_{G}(c)} {}\\ & =& \alpha ^{2}L_{ G}(c) {}\\ \end{array}$$

Therefore, for all clusterings C 1, C 2, if L G (C 1) ≥ L G (C 2) then α 2 L G (C 1) ≥ α 2 L G (C 2), implying L α G (C 1) ≥ L α G (C 2).

1.4 1.4 Richness

Definition 3.

A graph clustering quality function Q is rich if for all sets V and all non-trivial partitions C of V, there is a graph G = (V, w) such that C is the Q-optimal clustering of V, i.e. argmax C Q G (C) = C

Proposition 3.

Compactness is rich

Sketch of proof: For a graph in which all clusters in C are cliques and there is no edge between clusters, C is the most compact clustering since any other has either:

  • disconnected clusters, which have a zero compactness, and separating them into connected clusters improves the compactness

  • multiple clusters including nodes belonging in the same clique, and merging these clusters adds the edges that are between them to the original score

Proof.

\(\forall C^{{\ast}}\in \mathcal{C}(V )\), let G = (V, w) be a graph such that, \(\forall (i,j) \in V ^{2}\), w(i, j) = 1 if \(\exists c \in C^{{\ast}}\) such that i ∈ c and j ∈ c (i and j belong to the same cluster) and w(i, j) = 0 otherwise. G is therefore a clique graph (all connected components are cliques), where the disconnected cliques are the clusters in C . Let C be an optimal clustering of G w.r.t. compactness, i.e. \(\text{argmax}_{D\in \mathcal{C}(V )}L_{G}(D) = C\).

If \(\exists c \in C\) such that \(\exists (i,j) \in c^{2}\), w(i, j) = 0, then i and j are not in the same cluster in C , and c is disconnected therefore its compactness equals to zero. When separating c into multiple connected clusters (c 1, , c k ), note that compactness of any of these clusters is positive, and therefore greater or equal to the compactness of c. Calling C′ the clustering such that C′ = Cc ∪{ c 1, , c k }. Therefore

$$\displaystyle\begin{array}{rcl} L(C')& =& L(C\setminus c) + L(\{c_{1},\ldots,c_{k}\}) {}\\ & \geq & L(C\setminus c) + L(c) {}\\ \Leftrightarrow L(C')& \geq & L(C) {}\\ \end{array}$$

The scores are equals iff all nodes in c have zero degree. Therefore, \(\forall c \in C\), c is connected in G or composed of zero-degree nodes (connectedness condition).

If \(\exists (c_{1},c_{2}) \in C^{2},c_{1}\neq c_{2}\) such that \(\exists i \in c_{1}\), j ∈ c 2, w(i, j) = 1 (i.e. there is an edge between two clusters in C), then i and j are in the same cluster in C and in different clusters in C. Because the nodes i and j have a non-zero degree, the clusters c 1 and c 2 are both connected. Noting that the distance between any connected pair of nodes is always 1 on this graph, we have diam(c 1) = diam(c 2) = diam(c 1c 2) = 1. We call C′ the clustering corresponding to C where the c 1 and c 2 has been replaced by a fusion of them, that is C′ = C∖{c 1, c 2} ∪{ c 1c 2}.

$$\displaystyle\begin{array}{rcl} L(C')& =& L(C\setminus \{c_{1},c_{2}\}) + L(\{c_{1} \cup c_{2}\}) {}\\ & \geq & L(C\setminus \{c_{1},c_{2}\}) + L(c_{1}) + L(c_{2}) + \frac{w(i,j)} {\text{diam}(\{c_{1} \cup c_{2}\})} {}\\ & >& L(C\setminus \{c_{1},c_{2}\}) + L(c_{1}) + L(c_{2}) {}\\ \Leftrightarrow L(C')& >& L(C) {}\\ \end{array}$$

Since C is optimal w.r.t. L, there is no edge between the clusters of C, which is equivalent to \(\forall c \in C\), \(\exists c' \in C^{{\ast}}\), c′ ⊆ c (maximality condition).

Both conditions imply that any cluster in C is either a maximal connected component (and therefore a cluster in C ) or a set of zero-degree nodes. Since any set containing a zero-degree node has the same compactness (zero), C has the same compactness as any other maximal clustering. Therefore, C is a maximum-compactness clustering of G.

1.5 1.5 Monotonicity

Definition 4.

Let G = (V, w) be a graph and C a clustering of G. A graph G′ = (V, w′) is a C -consistent improvement of G if for all nodes i and j, w′(i, j) ≥ w(i, j) whenever i is in the same community as j and w′(i, j) ≤ w(i, j) whenever i is not the in the same community as j.

Definition 5.

A graph clustering quality function Q is monotonic if for all graphs G, all clusterings C of G and all C-consistent improvements G′ of G it is the case that Q G(C) ≥ Q G (C)

Proposition 4.

Compactness is monotonic

Sketch of proof: Compactness is not influenced by between-clusters weights, and increasing the weight inside clusters can only decrease or not affect the diameter. Therefore, compactness is either unaffected or increased by such a modification.

Proof.

Due to the insensibility of compactness to the weight of external edges, we will focus on inter-cluster edges.

\(\forall c \in C\), \(\forall \varPi\) a path in the subgraph of G induced by C, l e n G (Π) ≥ l e n G(Π) (since all weights have increased or stayed the same). Since this is true for all paths, diam G (c) ≥ diam G(c). Therefore, \(L_{G}(c) =\sum _{i,j\in c^{2}}w(i,j)/\text{diam}_{G}(c) \leq \sum _{i,j\in c^{2}}w'(i,j)/\text{diam}_{G'}(c) = L_{G'}(c)\).

A consistent improvement thus implies an equal or increased compactness, which proves monotonicity.

1.6 1.6 Locality

Definition 6.

Let G 1 = (V 1, w 1) and G 2 = (V 2, w 2) be two graphs and let V a  ⊆ V 1V 2 be a subset of the common nodes. We say that the graphs agree on V a if w 1(i, j) = w 2(i, j) for all i, j ∈ V a . We say that the graphs also agree on the neighbourhood of V a if

  • w 1(i, j) = w 2(i, j) for all i ∈ V a and j ∈ V 1V 2,

  • w 1(i, j) = 0 for all i ∈ V a and j ∈ V 1V 2, and

  • w 2(i, j) = 0 for all i ∈ V a and j ∈ V 2V 1.

This means that for nodes in V a the weights and endpoints of incident edges are exactly the same in the two graphs.

Definition 7.

A graph clustering quality function Q is local if for all graphs G 1 = (V 1, w 1) and G 2 = (V 2, w 2) that agree on a set V a and its neighbourhood, and for all clusterings C a , D a of V a , C 1 of V 1V a and C 2 of V 2V a , if \(Q_{G_{1}}(C_{a} \cup C_{1}) \geq Q_{G_{1}}(D_{a} \cup C_{1})\) then \(Q_{G_{2}}(C_{a} \cup C_{2}) \geq Q_{G_{2}}(D_{a} \cup C_{2})\).

Proposition 5.

Compactness is local

Sketch of proof: Thanks to additivity properties of compactness and the fact that it only uses internal data, any clustering preference on a graph G is kept on a graph G′ that would include it.

Proof.

Let G 1 = (V 1, w 1) and G 2 = (V 2, w 2) be two graphs that agree on a set V a and its neighbourhood. By definition, \(\forall (i,j) \in V _{a}^{2}\), w 1(i, j) = w 2(i, j). Since compactness only takes into account internal edges and internal paths, and since G 1 and G 2 agree on V a and its internal edges, \(\forall c \in \mathcal{P}(V _{a})\), \(L_{G_{1}}(c) = L_{G_{2}}(c)\). Therefore, \(\forall C \in \mathcal{C}(V _{a})\), \(L_{G_{1}}(C) = L_{G_{2}}(C)\).

We immediately get \(\forall (C_{a},D_{a}) \in \mathcal{C}(V _{a})^{2}\), \(\forall C_{1} \in \mathcal{C}(V _{1}\setminus V _{a})\) and \(\forall C_{2} \in \mathcal{C}(V _{2}\setminus V _{a})\)

$$\displaystyle\begin{array}{rcl} & & L_{G_{1}}(C_{a} \cup C_{1}) \geq L_{G_{1}}(D_{a} \cup C_{1}) {}\\ & \Leftrightarrow & L_{G_{1}}(C_{a}) + L_{G_{1}}(C_{1}) \geq L_{G_{1}}(D_{a}) + L_{G_{1}}(C_{1}) {}\\ & \Leftrightarrow & L_{G_{1}}(C_{a}) \geq L_{G_{1}}(D_{a}) {}\\ & \Leftrightarrow & L_{G_{2}}(C_{a} \cup C_{2}) \geq L_{G_{2}}(D_{a} \cup C_{2}) {}\\ \end{array}$$

1.7 1.7 Continuity

Definition 8.

A quality function Q is continuous if a small change in the graph leads to a small change in the quality. Formally, Q is continuous if for every ε > 0 and every graph G = (V, w), there exists a δ > 0 such that for all graphs G′ = (V, w′), if w(i, j) −δ < w′(i, j) < w(i, j) +δ for all nodes i and j, then Q G(C) −ε < Q G (C) < Q G(C) +ε for all clusterings C of G.

Proposition 6.

Compactness is continuous

Sketch of proof: First, we prove that the distance function is continuous for connected graphs. To that aim, we show that this distance on any Cauchy sequence of graphs converging to the graph converges to the distance. Diameter is therefore continuous and continuity is insured on disconnected graphs by showing that the diameter goes to infinity when any graph gets close to disconnected.

Proof.

We note that this definition of continuity corresponds to the standard continuity of a multivariate function, with the distance between two graphs being the maximum of the absolute difference in edge weights. We call this distance function d(G, G′). Therefore, we can use known properties, such as the continuity of the combination of continuous functions, etc.

Lemma 1.

For a connected graph G = (V,w), \(\forall (a,b) \in V \times V\) , dist G (a,b) is continuous.

Proof.

Let G n  = (V, w n ) be a Cauchy sequence of graphs. Then, \(\forall (i,j) \in V ^{2}\), \((w_{n}(i,j))_{n\in \mathbb{N}}\) is also a Cauchy sequence, therefore \(\exists w\) such that w n (i, j) → w(i, j) (in this context, → means “converge to”) and a graph G = (V, w) such as G n  → G. We assume G to be connected.

\(\forall (a,b) \in V ^{2}\), let Π = (a 0 = a, a 1, , a k−1, a k  = b) be a path such as dist G (a, b) = l e n G (Π), that is a minimal path in G between a and b. If \(\exists i \in [0: k - 1]\) such as w(a i , a i+1) = 0, then len G (Π) is undefined and therefore it is not a minimal path. Since a minimal path exists (due to the definition of a connected graph), \(\forall i \in [0: k - 1]\), w(a i , a i+1) > 0.

Since \(\forall i \in [0: k - 1]\), w n (a i , a i+1) ≠ 0, then \(f(x_{0},\ldots,x_{k-1}) =\sum _{i\in [0:k-1]} \frac{1} {x_{i}}\) is continuous in (w(a 0, a 1), , w(a k−1, a k )). Therefore, \(\text{len}_{G_{n}}(\varPi ) \rightarrow len_{G}(\varPi )\).

Since \(\text{dist}_{G_{n}}(a,b) \leq \text{len}_{G_{n}}(\varPi )\),

$$\displaystyle\begin{array}{rcl} \limsup \limits _{n\rightarrow +\infty }\text{dist}_{G_{n}}(a,b)& \leq & \limsup \limits _{n\rightarrow +\infty }\text{len}_{G_{n}}(\varPi ) {}\\ & =& \lim \limits _{n\rightarrow +\infty }\text{len}_{G_{n}}(\varPi ) = \text{len}_{G}(\varPi ) = \text{dist}_{G}(a,b) {}\\ \limsup \limits _{n\rightarrow +\infty }\text{dist}_{G_{n}}(a,b)& \leq & \text{dist}_{G}(a,b) {}\\ \end{array}$$

\(\liminf \limits _{n\rightarrow +\infty }\text{dist}_{G_{n}}(a,b) \leq \limsup \limits _{n\rightarrow +\infty }\text{dist}_{G_{n}}(a,b)\), therefore

$$\displaystyle{ \liminf \limits _{n\rightarrow +\infty }dist_{G_{n}}(a,b) \leq \limsup \limits _{n\rightarrow +\infty }\text{dist}_{G_{n}}(a,b) \leq \text{dist}_{G}(a,b) }$$
(11)

Let ε > 0. Since \(\limsup \limits _{n\rightarrow +\infty }\text{dist}_{G_{n}}(a,b) \leq \text{dist}_{G}(a,b)\), \(\exists n_{0} \in \mathbb{N}\) such that \(\forall n \geq n_{0}\), \(\text{dist}_{G_{n}}(a,b) \leq (1+\epsilon )\text{dist}_{G}(a,b)\).

Let \(n_{1} \in \mathbb{N}\) such that \(\forall n \geq n_{1}\), \(d(G,G') \leq \dfrac{1} {2 \times \text{dist}_{G}(a,b)}\).

By definition, \(\forall (i,j) \in V \times V\), \(\vert w_{n}(i,j) - w(i,j)\vert \leq \dfrac{1} {2 \times \text{dist}_{G}(a,b)}\).

For \(n \in \mathbb{N}\), let Π n be a path such as \(\text{dist}_{G_{n}}(a,b) = len_{G_{n}}(\varPi _{n})\). If Π n is not a path in G for n ≥ m a x(n 0, n 1), then for \(\varPi _{n} = (a_{0}^{(n)},\ldots,a_{k_{n}}^{(n)})\), \(\exists i \in [0: k_{n} - 1]\) such as w n (a i (n), a i+1 (n)) ≠ 0 and w(a i (n), a i+1 (n)) = 0. In that case,

$$\displaystyle{\vert w_{n}(a_{i}^{(n)},a_{ i+1}^{(n)})\vert = \vert w_{ n}(a_{i}^{(n)},a_{ i}^{(n)}) - w(a_{ i}^{(n)},a_{ i}^{(n)})\vert \leq \dfrac{1} {2 \times \text{dist}_{G}(a,b)}}$$
$$\displaystyle{\Rightarrow \text{len}_{G_{n}}(\varPi _{n}) \geq \dfrac{1} {w_{n}(a_{i}^{(n)},a_{i+1}^{(n)})} \geq 2 \times \text{dist}_{G}(a,b)}$$

However, this is contradictory, since \(\text{len}_{G_{n}}(\varPi _{n}) = \text{dist}_{G_{n}}(a,b) \leq (1+\epsilon )\text{dist}_{G}(a,b)\) for n ≥ n 0. Then, for n ≥ max(n 0, n 1), Π n is also a path in G.

Let \(n_{3} \in \mathbb{N}\) such that \(\forall n \geq n_{3}\), d(G, G n ) ≤ ε w min with \(w_{\text{min}} = \text{min}_{(u,v)\in V ^{2},w(u,v)\neq 0}w(u,v)\). First, we note that

$$\displaystyle{\forall (u,v) \in V ^{2},w_{ n}(u,v) \geq w(u,v) -\epsilon w_{\text{min}} \geq (1-\epsilon )w_{\text{min}}}$$

Then

$$\displaystyle\begin{array}{rcl} \vert \text{len}_{G_{n}}(\varPi _{n}) -\text{len}_{G}(\varPi _{n})\vert & =& \vert \sum _{i\in [0:k_{n}-1]} \dfrac{1} {w_{n}(a_{i}^{(n)},a_{i+1}^{(n)})} - \dfrac{1} {w(a_{i}^{(n)},a_{i+1}^{(n)})}\vert {}\\ & =& \sum _{i\in [0:k_{n}-1]}\frac{\vert w_{n}(a_{i}^{(n)},a_{i+1}^{(n)}) - w(a_{i}^{(n)},a_{i+1}^{(n)})\vert } {w_{n}(a_{i}^{(n)},a_{i+1}^{(n)}) \times w(a_{i}^{(n)},a_{i+1}^{(n)})} {}\\ & \leq & \sum _{i\in [0:k_{n}-1]} \frac{\epsilon w_{\text{min}}} {w_{n}(a_{i}^{(n)},a_{i+1}^{(n)}) \times w(a_{i}^{(n)},a_{i+1}^{(n)})} {}\\ & \leq & \sum _{i\in [0:k_{n}-1]} \frac{\epsilon w_{\text{min}}} {(1-\epsilon )w_{\text{min}} \times w_{\text{min}}} {}\\ \vert len_{G_{n}}(\varPi _{n}) - len_{G}(\varPi _{n})\vert & \leq & \dfrac{\vert V \vert \epsilon } {w_{\text{min}}(1-\epsilon )} {}\\ \end{array}$$

Since len G (Π n ) ≥ dist G (a, b) (it is a path between a and b), then \(\text{len}_{G_{n}}(\varPi _{n}) \geq \text{dist}_{G}(a,b) - \dfrac{\vert V \vert \epsilon } {w_{\text{min}}(1-\epsilon )}\). Therefore, \(\liminf \limits _{n\rightarrow +\infty }\text{len}_{G_{n}}(\varPi _{n}) \geq \text{dist}_{G}(a,b)\). Combined with Eq. 11:

$$\displaystyle{\limsup \limits _{n\rightarrow +\infty }\text{dist}_{G_{n}}(a,b) \leq \text{dist}_{G}(a,b) \leq \liminf \limits _{n\rightarrow +\infty }\text{len}_{G_{n}}(\varPi _{n})}$$
$$\displaystyle{\Rightarrow \lim \limits _{n\rightarrow +\infty }\text{dist}_{G_{n}}(a,b) = \text{dist}_{G}(a,b)}$$

which proves that the distance between any two nodes in a connected graph is continuous.

End of the proof of Lemma  1

We now prove continuity of the function on unconnected clusters. In order to simplify notations, we directly work on the induced subgraphs, and we extend L to take a graph as an input:

$$\displaystyle\begin{array}{rcl} L(G)& =& \left \{\begin{array}{l l} 0 &\quad \mbox{if $\vert \mathrm{V} \vert$ = 1\ or\ G disconnected} \\ \dfrac{\sum _{(u,v)\in V ^{2}}w(u,v)} {\text{diam}(G)} &\quad \text{otherwise} \end{array} \right.{}\\ \end{array}$$

From Lemma 1, we know that dist(u, v) is continuous for all connected graphs, and dist(u, v) > 0. The maximum of continuous functions is continuous, which means that diam(G) is continuous for all connected graphs and diam(G) > 0. The combination of continuous functions is continuous, and 1∕x is continuous on \(\mathbb{R}^{+}\). We conclude that L(G) is continuous on all connected graphs.

We now prove that L(G) is continuous on unconnected graphs. Just as in Lemma 1, we take a Cauchy sequence of graphs \((G_{n})_{n\in \mathbb{N}}: G_{n} = (V,w_{n}) \rightarrow G = (V,w)\), but with G disconnected. For all \(n \in \mathbb{N}\), if G n is disconnected, L(G n ) = 0 = L(G).

Since G is disconnected, \(\exists (u,v) \in V \times V\), for all paths π ∈ paths(u, v) between u and v, \(\exists i \in [0: k - 1]\) such that w(a i , a i+1) = 0. If G n is not disconnected, there exists a minimal path Π n  = (a 0 = u, a 1, , a k  = v) ∈ paths(u, v) (len(Π n ) = diam(G n )) such that \(\forall i \in [0: k - 1]\), w n (a i (n), a i+1 (n)) > 0. By definition, \(\text{len}(\varPi _{n}) =\sum _{i\in [0:k-1]} \dfrac{1} {w_{n}(a_{i}^{(n)},a_{i+1}^{(n)})} > \dfrac{1} {w_{\text{min}}^{(n)}}\) where w min (n) = min i ∈ [0: k−1](w n (a i (n), a i+1 (n))). Since G n converges to G, and G disconnected, \(\lim \limits _{n\rightarrow +\infty }w_{min}^{(n)} = 0^{+}\). Therefore, \(\lim \limits _{n\rightarrow +\infty }\text{dist}_{G_{n}}(u,v) =\lim \limits _{n\rightarrow +\infty }\min _{\pi \in \text{paths}(u,v)}\text{len}_{G_{n}}(\pi ) = +\infty \).

Since the diameter is the maximum of the distances between all pairs of nodes, \(\lim \limits _{n\rightarrow +\infty }\text{diam}(G_{n}) = +\infty \). By definition of the Cauchy sequence, \(\lim \limits _{n\rightarrow +\infty }\sum _{(u,v)\in V ^{2}}w_{n}(u,v) =\sum _{(u,v)\in V ^{2}}w(u,v)\). Therefore,

$$\displaystyle{\lim \limits _{n\rightarrow +\infty }L(G_{n}) =\sum _{(u,v)\in V ^{2}}w_{n}(u,v)/\text{diam}(G_{n}) = 0 = L(G)}$$

which implies that for all disconnected graph G, L is continuous on G.

Since compactness is the sum of L(G) applied to subgraphs induced by the clustering, compactness is continuous.

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this chapter

Cite this chapter

Creusefond, J., Largillier, T., Peyronnet, S. (2017). A LexDFS-Based Approach on Finding Compact Communities. In: Kaya, M., Erdoǧan, Ö., Rokne, J. (eds) From Social Data Mining and Analysis to Prediction and Community Detection. Lecture Notes in Social Networks. Springer, Cham. https://doi.org/10.1007/978-3-319-51367-6_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-51367-6_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-51366-9

  • Online ISBN: 978-3-319-51367-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics