A LexDFS-Based Approach on Finding Compact Communities

Creusefond, Jean; Largillier, Thomas; Peyronnet, Sylvain

doi:10.1007/978-3-319-51367-6_7

Jean Creusefond¹⁶,
Thomas Largillier¹⁶ &
Sylvain Peyronnet¹⁷

Part of the book series: Lecture Notes in Social Networks ((LNSN))

988 Accesses
2 Citations

Abstract

This article presents an efficient hierarchical clustering algorithm based on a graph traversal algorithm called LexDFS. This traversal algorithm has the property of going through the clustered parts of the graph in a small number of iterations, making them recognisable. The time complexity of our method is in O(n × log(n)). It is simple to implement and a thorough study shows that it outputs clusterings that are closer to some ground-truths than its competitors. Experiments are also carried out to analyse the behaviour of the algorithm during execution on sample graphs. This article also features a quality function called compactness, which measures how efficient is the cluster for internal communications. We prove that this quality function features interesting theoretical properties.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Hardcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://snap.stanford.edu/data/.

References

Adamcsek B, Palla G, Farkas I, Derényi I, Vicsek T. CFinder: locating cliques and overlapping modules in biological networks. Bioinformatics. 2006;22(8):1021–23
Article Google Scholar
Aldecoa R, Marín I. Surprise maximization reveals the community structure of complex networks. Sci Rep 2013;3. http://www.nature.com/articles/srep01060?WT.ec_id=SREP-631-20130201 and http://www.nature.com/articles/srep02930
Bagga A, Baldwin B. Entity-based cross-document coreferencing using the vector space model. In: Proceedings of the 17th international conference on computational linguistics, vol. 1. Stroudsburg: Association for Computational Linguistics; 1998. P. 79–85
Google Scholar
Barabási AL, Albert R. Emergence of scaling in random networks. Science 1999; 286(5439):509–12
Article MathSciNet MATH Google Scholar
Blondel VD, Guillaume JL, Lambiotte R, Lefebvre E. Fast unfolding of communities in large networks. J Stat Mech: Theory Exp 2008; 2008(10): P10,008
Google Scholar
Brandes U, Delling D, Gaertler M, Gorke R, Hoefer M, Nikoloski Z, et al. On modularity clustering. IEEE Trans Knowl Data Eng 2008;20(2):172–88
Article MATH Google Scholar
Chakraborty T, Sikdar S, Ganguly N, Mukherjee A. Citation interactions among computer science fields: a quantitative route to the rise and fall of scientific research. Soc Netw Anal Min 2014;4(1):1–18
Article Google Scholar
Chakraborty T, Sikdar S, Tammana V, Ganguly N, Mukherjee A. Computer science fields as ground-truth communities: their impact, rise and fall. In: Proceedings of advances in social networks analysis and mining (ASONAM). New York: ACM, 2013. P. 426–33
Google Scholar
Chakraborty T, Srinivasan S, Ganguly N, Mukherjee A, Bhowmick S. On the permanence of vertices in network communities. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining, KDD 2014. New York, NY: ACM; 2014. P. 1396–405
Google Scholar
Clauset A, Newman M, Moore C. Finding community structure in very large networks. Phys Rev E 2004;70(6). http://journals.aps.org/pre/abstract/10.1103/PhysRevE.70.066111
Corneil DG, Dalton B, Habib M. LDFS-based certifying algorithm for the minimum path cover problem on cocomparability graphs. SIAM J Comput 2013;42(3):792–807
Article MathSciNet MATH Google Scholar
Corneil DG, Krueger RM. A unified view of graph searching. SIAM J Discr Math 2008;22(4):1259–276
Article MathSciNet MATH Google Scholar
Creusefond J, Largillier T, Peyronnet S. Finding compact communities in large graphs. In: Proceedings of advances in social networks analysis and mining (ASONAM), 2015. ACM; 2015. P. 1457–464
Google Scholar
Creusefond J, Largillier T, Peyronnet S. On the evaluation potential of quality functions in community detection for different contexts. In: Advances in network science. Springer; 2016. P. 111–125
Google Scholar
Flake GW, Lawrence S, Giles CL. Efficient identification of Web communities. In: Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining. New York: ACM, 2000. P. 150–60
Google Scholar
Fortunato S. Community detection in graphs. Phys Rep 2010;486(3–5):75–174
Article MathSciNet Google Scholar
Fortunato S, Barthelemy M. Resolution limit in community detection. Proc Natl Acad Sci 2007;104(1):36–41
Article Google Scholar
Girvan M, Newman ME. Community structure in social and biological networks. Proc Natl Acad Sci 2002;99(12):7821–826
Google Scholar
Hansen P, Jaumard B. Minimum sum of diameters clustering. J Class 1987;4(2):215–26
Article MathSciNet MATH Google Scholar
Hu Y. Efficient, high-quality force-directed graph drawing. Math J 2005;10(1):37–71
MathSciNet Google Scholar
Kannan R, Vempala S, Vetta A. On clusterings: good, bad and spectral. J ACM (JACM) 2004;51(3):497–515
Article MathSciNet MATH Google Scholar
Klimt B, Yang Y. Introducing the enron corpus. In: CEAS. 2004
MATH Google Scholar
Lancichinetti A, Fortunato S, Kertész J. Detecting the overlapping and hierarchical community structure in complex networks. New J Phys 2009;11(3):033015
Article Google Scholar
Leskovec J, Kleinberg J, Faloutsos C. Graph evolution: densification and shrinking diameters. ACM Trans Knowl Discov Data 2007;1(1):2
Article Google Scholar
Leskovec J, Lang KJ, Dasgupta A, Mahoney MW. Statistical properties of community structure in large social and information networks. In: Proceedings of the 17th international conference on World Wide Web. ACM; 2008. P. 695–704
Google Scholar
Leskovec J, Lang KJ, Mahoney M. Empirical comparison of algorithms for network community detection. In: Proceedings of the 19th international conference on World wide web. ACM; 2010. P. 631–40
Google Scholar
Leskovec J, Mcauley JJ. Learning to discover social circles in ego networks. In: Advances in neural information processing systems; 2012. P. 539–47
Google Scholar
Mislove A, Marcon M, Gummadi KP, Druschel P, Bhattacharjee B. Measurement and analysis of online social networks. In: Proceedings of the 5th ACM/Usenix internet measurement conference (IMC 2007), San Diego, CA; 2007
Google Scholar
Newman ME, Girvan M. Finding and evaluating community structure in networks. Phys Rev E 2004;69(2):026113
Article Google Scholar
Pons P, Latapy M. Computing communities in large networks using random walks. J Graph Algorithms Appl 2006;10(2):191–218
Article MathSciNet MATH Google Scholar
Radicchi F, Castellano C, Cecconi F, Loreto V, Parisi D. Defining and identifying communities in networks. Proc Natl Acad Sci USA 2004;101(9):2658–2663
Article Google Scholar
Raghavan U, Albert R, Kumara S. Near linear time algorithm to detect community structures in large-scale networks. Phys Rev E 2007;76(3). http://journals.aps.org/pre/abstract/10.1103/PhysRevE.76.036106
Rosvall M, Bergstrom CT. Maps of random walks on complex networks reveal community structure. Proc Natl Acad Sci 2008;105(4):1118–123
Article Google Scholar
Seidman SB. Network structure and minimum degree. Soc Netw 1983;5(3):269–87
Article MathSciNet Google Scholar
Šubelj L, Bajec M. Model of complex networks based on citation dynamics. In: Proceedings of the 22nd international conference on World Wide Web; 2013. P. 527–30
Google Scholar
Tarjan RE. Efficiency of a good but not linear set union algorithm. J ACM (JACM) 1975;22(2):215–25
Article MathSciNet MATH Google Scholar
Traag VA, Krings G, Van Dooren P. Significant scales in community structure. Sci Rep 2013;3. http://www.nature.com/articles/srep01060?WT.ec_id=SREP-631-20130201 and http://www.nature.com/articles/srep02930
van Dongen S. Graph clustering by flow simulation. Ph.D. thesis (2000)
Google Scholar
Van Laarhoven T, Marchiori E.: Axioms for graph clustering quality functions. J Mach Learn Res 2014;15(1):193–215
MathSciNet MATH Google Scholar
Watts DJ, Strogatz SH. Collective dynamics of ‘small-world’ networks. Nature 1998;393(6684):440–42
Article Google Scholar
Yang J, Leskovec J. Defining and evaluating network communities based on ground-truth. Knowl Inf Syst 2012;42(1):81–213
Google Scholar

Download references

Acknowledgements

The authors thank Loïck Lhote for his help with the proof of continuity.

Author information

Authors and Affiliations

Normandy University, Caen, France
Jean Creusefond & Thomas Largillier
ix-labs, Rouen and Qwant, Paris, France
Sylvain Peyronnet

Authors

Jean Creusefond
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Largillier
View author publications
You can also search for this author in PubMed Google Scholar
Sylvain Peyronnet
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jean Creusefond .

Editor information

Editors and Affiliations

Department of Computer Engineering, Firat University, Elazig, Turkey
Mehmet Kaya
Ministry of Interior, Ankara, Turkey
Özcan Erdoǧan
Department of Computer Science, University of Calgary, Calgary, AB, Canada
Jon Rokne

1 Appendix: Proofs of Axioms Compliance

For technical reasons, we also need to define compactness when applied on unconnected clusters. The quality of a cluster on which information cannot spread is the lowest in our model, therefore the quality of a disconnected cluster is defined as a zero value.

1.1 1.1 Specific Notations

Some of previous notations omitted the graph dependency for brevity (length, distance, diameter, compactness, etc.). When there is ambiguity, the graph will be specified as a subscript. A connected graph in this context is a graph G = (V, w) for which $\forall (u,v) \in V \times V$, dist(u, v) is defined, or equivalently there exist a path π ∈ paths(u, v) between u and v such that $\forall i \in [0: \vert \pi \vert - 1]$, w(a _i, a _i+1) > 0.

$\mathcal{P}(V )$ is the powerset of a set V of nodes, that is the set of all possible clusters.

$\mathcal{C}(V )$ is the set of possible partitions of a set V of nodes, that is $\{\{c_{1} \in \mathcal{P}(V ),\ldots,c_{\vert c\vert }\mathcal{P}(V )\}$, ∪ c _i = V, ∩ c _i = ∅}.

1.2 1.2 Permutation Invariance

Definition 1.

A graph clustering quality function Q is permutation invariant if for all graphs G = (V, w) and all isomorphisms f: V → V ′, it is the case that Q _G(C) = Q _f(G)(f(C)) where f is extended to graphs and clusterings by f(C) = {{ f(i) | i ∈ c} | c ∈ C} and f((V, w)) = (V ′, (i, j) → w(f ⁻¹(i), f ⁻¹(j)))

Proposition 1.

Compactness is permutation invariant

Sketch of proof: Compactness only uses internal edges as an input, therefore it does not depend on representation.

Proof.

First, distances on weighted graphs as defined previously are permutation invariant:

$$\displaystyle\begin{array}{rcl} \text{dist}_{f(G)}(f(u),f(v))& =& \text{min}_{\pi \in \text{paths}_{ f(G)}(u,v)}(\text{len}_{f(G)}(\pi )) {}\\ & =& \text{min}_{\pi \in \text{paths}_{ f(G)}(u,v)}(\text{len}_{G}(f^{-1}(\pi ))) {}\\ & =& \text{dist}_{G}(u,v) {}\\ \end{array}$$

Compactness is permutation invariant:

$$\displaystyle\begin{array}{rcl} L_{f(G)}(f(C))& =& \sum _{f(c)\in f(C)}\sum _{(f(i),f(j))\in f(c)^{2}} \frac{w(f(i),f(j))} {\text{max}_{(f(u),f(v))\in f(C)^{2}}(\text{dist}_{f(G)}(f(u),f(v)))} {}\\ & =& \sum _{f(c)\in f(C)}\sum _{(f(i),f(j))\in f(c)^{2}} \frac{w(i,j)} {max_{(f(u),f(v))\in f(C)^{2}}(dist_{G}(u,v))} {}\\ L_{f(G)}(f(C))& =& L_{G}(C) {}\\ \end{array}$$

1.3 1.3 Scale Invariance

Definition 2.

A graph clustering quality function Q is scale invariant if for all graphs G = (V, E), all clusterings C ₁, C ₂ of G and all constants α > 0, Q _G(C ₁) ≤ Q _G(C ₂) if and only if Q _{α G}(C ₁) ≤ Q _{α G}(C ₂), where α G = (V, (i, j) → α w(i, j)) is a graph with edge weights scaled by a factor α.

Proposition 2.

Compactness is scale invariant

Sketch of proof: Multiplying the edges by a α implies that the diameter is multiplied by 1∕α. Since the numerator of any element of the sum of compactness is multiplied by α, the total score is multiplied by α ², therefore the order of the clusterings rank is kept.

Proof.

$$\displaystyle\begin{array}{rcl} \text{len}_{\alpha G}(v_{1},\ldots,v_{k})& =& \sum _{i=1}^{k-1} \frac{1} {\alpha w(v_{i},v_{i+1})} {}\\ \text{len}_{\alpha G}(v_{1},\ldots,v_{k})& =& \frac{len_{G}(v_{1},\ldots,v_{k})} {\alpha } {}\\ \end{array}$$

Since the lengths of the paths are linearly correlated, the minimum paths are the same in G and in α G, therefore:

$$\displaystyle\begin{array}{rcl} \text{dist}_{\alpha G}(u,v)& =& \text{min}_{\pi \in \text{paths}_{\alpha G}(u,v)}\left (\text{len}_{\alpha G}(\pi )\right ) {}\\ & =& \text{min}_{\pi \in \text{paths}_{\alpha G}(u,v)}(\frac{\text{len}_{G}(\pi )} {\alpha } ) {}\\ & =& \frac{\text{dist}_{G}(u,v)} {\alpha } {}\\ \end{array}$$

Using the same reasoning, diam_{α G}(c) = diam_G(c)∕α the compactness can be written as:

$$\displaystyle\begin{array}{rcl} L_{\alpha G}(C)& =& \sum _{c\in C}\sum _{(i,j)\in c^{2}} \frac{w_{\alpha G}(i,j)} {\text{diam}_{\alpha G}(c)} {}\\ & =& \sum _{c\in C}\sum _{(i,j)\in c\times c}\alpha ^{2} \frac{w_{G}(i,j)} {\text{diam}_{G}(c)} {}\\ & =& \alpha ^{2}L_{ G}(c) {}\\ \end{array}$$

Therefore, for all clusterings C ₁, C ₂, if L _G(C ₁) ≥ L _G(C ₂) then α ² L _G(C ₁) ≥ α ² L _G(C ₂), implying L _{α G}(C ₁) ≥ L _{α G}(C ₂).

1.4 1.4 Richness

Definition 3.

A graph clustering quality function Q is rich if for all sets V and all non-trivial partitions C ^∗ of V, there is a graph G = (V, w) such that C ^∗ is the Q-optimal clustering of V, i.e. argmax_C Q _G(C) = C ^∗

Proposition 3.

Compactness is rich

Sketch of proof: For a graph in which all clusters in C are cliques and there is no edge between clusters, C is the most compact clustering since any other has either:

disconnected clusters, which have a zero compactness, and separating them into connected clusters improves the compactness
multiple clusters including nodes belonging in the same clique, and merging these clusters adds the edges that are between them to the original score

Proof.

$\forall C^{{\ast}}\in \mathcal{C}(V )$, let G = (V, w) be a graph such that, $\forall (i,j) \in V ^{2}$, w(i, j) = 1 if $\exists c \in C^{{\ast}}$ such that i ∈ c and j ∈ c (i and j belong to the same cluster) and w(i, j) = 0 otherwise. G is therefore a clique graph (all connected components are cliques), where the disconnected cliques are the clusters in C ^∗. Let C be an optimal clustering of G w.r.t. compactness, i.e. $\text{argmax}_{D\in \mathcal{C}(V )}L_{G}(D) = C$.

If $\exists c \in C$ such that $\exists (i,j) \in c^{2}$, w(i, j) = 0, then i and j are not in the same cluster in C ^∗, and c is disconnected therefore its compactness equals to zero. When separating c into multiple connected clusters (c ₁, …, c _k), note that compactness of any of these clusters is positive, and therefore greater or equal to the compactness of c. Calling C′ the clustering such that C′ = C∖c ∪{ c ₁, …, c _k}. Therefore

$$\displaystyle\begin{array}{rcl} L(C')& =& L(C\setminus c) + L(\{c_{1},\ldots,c_{k}\}) {}\\ & \geq & L(C\setminus c) + L(c) {}\\ \Leftrightarrow L(C')& \geq & L(C) {}\\ \end{array}$$

The scores are equals iff all nodes in c have zero degree. Therefore, $\forall c \in C$, c is connected in G or composed of zero-degree nodes (connectedness condition).

If $\exists (c_{1},c_{2}) \in C^{2},c_{1}\neq c_{2}$ such that $\exists i \in c_{1}$, j ∈ c ₂, w(i, j) = 1 (i.e. there is an edge between two clusters in C), then i and j are in the same cluster in C ^∗ and in different clusters in C. Because the nodes i and j have a non-zero degree, the clusters c ₁ and c ₂ are both connected. Noting that the distance between any connected pair of nodes is always 1 on this graph, we have diam(c ₁) = diam(c ₂) = diam(c ₁ ∪ c ₂) = 1. We call C′ the clustering corresponding to C where the c ₁ and c ₂ has been replaced by a fusion of them, that is C′ = C∖{c ₁, c ₂} ∪{ c ₁ ∪ c ₂}.

$$\displaystyle\begin{array}{rcl} L(C')& =& L(C\setminus \{c_{1},c_{2}\}) + L(\{c_{1} \cup c_{2}\}) {}\\ & \geq & L(C\setminus \{c_{1},c_{2}\}) + L(c_{1}) + L(c_{2}) + \frac{w(i,j)} {\text{diam}(\{c_{1} \cup c_{2}\})} {}\\ & >& L(C\setminus \{c_{1},c_{2}\}) + L(c_{1}) + L(c_{2}) {}\\ \Leftrightarrow L(C')& >& L(C) {}\\ \end{array}$$

Since C is optimal w.r.t. L, there is no edge between the clusters of C, which is equivalent to $\forall c \in C$, $\exists c' \in C^{{\ast}}$, c′ ⊆ c (maximality condition).

Both conditions imply that any cluster in C is either a maximal connected component (and therefore a cluster in C ^∗) or a set of zero-degree nodes. Since any set containing a zero-degree node has the same compactness (zero), C ^∗ has the same compactness as any other maximal clustering. Therefore, C ^∗ is a maximum-compactness clustering of G.

1.5 1.5 Monotonicity

Definition 4.

Let G = (V, w) be a graph and C a clustering of G. A graph G′ = (V, w′) is a C -consistent improvement of G if for all nodes i and j, w′(i, j) ≥ w(i, j) whenever i is in the same community as j and w′(i, j) ≤ w(i, j) whenever i is not the in the same community as j.

Definition 5.

A graph clustering quality function Q is monotonic if for all graphs G, all clusterings C of G and all C-consistent improvements G′ of G it is the case that Q _G′(C) ≥ Q _G(C)

Proposition 4.

Compactness is monotonic

Sketch of proof: Compactness is not influenced by between-clusters weights, and increasing the weight inside clusters can only decrease or not affect the diameter. Therefore, compactness is either unaffected or increased by such a modification.

Proof.

Due to the insensibility of compactness to the weight of external edges, we will focus on inter-cluster edges.

$\forall c \in C$, $\forall \varPi$ a path in the subgraph of G induced by C, l e n _G(Π) ≥ l e n _G′(Π) (since all weights have increased or stayed the same). Since this is true for all paths, diam_G(c) ≥ diam_G′(c). Therefore, $L_{G}(c) =\sum _{i,j\in c^{2}}w(i,j)/\text{diam}_{G}(c) \leq \sum _{i,j\in c^{2}}w'(i,j)/\text{diam}_{G'}(c) = L_{G'}(c)$.

A consistent improvement thus implies an equal or increased compactness, which proves monotonicity.

1.6 1.6 Locality

Definition 6.

Let G ₁ = (V ₁, w ₁) and G ₂ = (V ₂, w ₂) be two graphs and let V _a ⊆ V ₁ ∩ V ₂ be a subset of the common nodes. We say that the graphs agree on V _a if w ₁(i, j) = w ₂(i, j) for all i, j ∈ V _a. We say that the graphs also agree on the neighbourhood of V _a if

w ₁(i, j) = w ₂(i, j) for all i ∈ V _a and j ∈ V ₁ ∩ V ₂,
w ₁(i, j) = 0 for all i ∈ V _a and j ∈ V ₁∖V ₂, and
w ₂(i, j) = 0 for all i ∈ V _a and j ∈ V ₂∖V ₁.

This means that for nodes in V _a the weights and endpoints of incident edges are exactly the same in the two graphs.

Definition 7.

A graph clustering quality function Q is local if for all graphs G ₁ = (V ₁, w ₁) and G ₂ = (V ₂, w ₂) that agree on a set V _a and its neighbourhood, and for all clusterings C _a, D _a of V _a, C ₁ of V ₁∖V _a and C ₂ of V ₂∖V _a, if $Q_{G_{1}}(C_{a} \cup C_{1}) \geq Q_{G_{1}}(D_{a} \cup C_{1})$ then $Q_{G_{2}}(C_{a} \cup C_{2}) \geq Q_{G_{2}}(D_{a} \cup C_{2})$.

Proposition 5.

Compactness is local

Sketch of proof: Thanks to additivity properties of compactness and the fact that it only uses internal data, any clustering preference on a graph G is kept on a graph G′ that would include it.

Proof.

Let G ₁ = (V ₁, w ₁) and G ₂ = (V ₂, w ₂) be two graphs that agree on a set V _a and its neighbourhood. By definition, $\forall (i,j) \in V _{a}^{2}$, w ₁(i, j) = w ₂(i, j). Since compactness only takes into account internal edges and internal paths, and since G ₁ and G ₂ agree on V _a and its internal edges, $\forall c \in \mathcal{P}(V _{a})$, $L_{G_{1}}(c) = L_{G_{2}}(c)$. Therefore, $\forall C \in \mathcal{C}(V _{a})$, $L_{G_{1}}(C) = L_{G_{2}}(C)$.

We immediately get $\forall (C_{a},D_{a}) \in \mathcal{C}(V _{a})^{2}$, $\forall C_{1} \in \mathcal{C}(V _{1}\setminus V _{a})$ and $\forall C_{2} \in \mathcal{C}(V _{2}\setminus V _{a})$

$$\displaystyle\begin{array}{rcl} & & L_{G_{1}}(C_{a} \cup C_{1}) \geq L_{G_{1}}(D_{a} \cup C_{1}) {}\\ & \Leftrightarrow & L_{G_{1}}(C_{a}) + L_{G_{1}}(C_{1}) \geq L_{G_{1}}(D_{a}) + L_{G_{1}}(C_{1}) {}\\ & \Leftrightarrow & L_{G_{1}}(C_{a}) \geq L_{G_{1}}(D_{a}) {}\\ & \Leftrightarrow & L_{G_{2}}(C_{a} \cup C_{2}) \geq L_{G_{2}}(D_{a} \cup C_{2}) {}\\ \end{array}$$

1.7 1.7 Continuity

Definition 8.

A quality function Q is continuous if a small change in the graph leads to a small change in the quality. Formally, Q is continuous if for every ε > 0 and every graph G = (V, w), there exists a δ > 0 such that for all graphs G′ = (V, w′), if w(i, j) −δ < w′(i, j) < w(i, j) +δ for all nodes i and j, then Q _G′(C) −ε < Q _G(C) < Q _G′(C) +ε for all clusterings C of G.

Proposition 6.

Compactness is continuous

Sketch of proof: First, we prove that the distance function is continuous for connected graphs. To that aim, we show that this distance on any Cauchy sequence of graphs converging to the graph converges to the distance. Diameter is therefore continuous and continuity is insured on disconnected graphs by showing that the diameter goes to infinity when any graph gets close to disconnected.

Proof.

We note that this definition of continuity corresponds to the standard continuity of a multivariate function, with the distance between two graphs being the maximum of the absolute difference in edge weights. We call this distance function d(G, G′). Therefore, we can use known properties, such as the continuity of the combination of continuous functions, etc.

Lemma 1.

For a connected graph G = (V,w), $\forall (a,b) \in V \times V$ , dist _G (a,b) is continuous.

Proof.

Let G _n = (V, w _n) be a Cauchy sequence of graphs. Then, $\forall (i,j) \in V ^{2}$, $(w_{n}(i,j))_{n\in \mathbb{N}}$ is also a Cauchy sequence, therefore $\exists w$ such that w _n(i, j) → w(i, j) (in this context, → means “converge to”) and a graph G = (V, w) such as G _n → G. We assume G to be connected.

$\forall (a,b) \in V ^{2}$, let Π = (a ₀ = a, a ₁, …, a _k−1, a _k = b) be a path such as dist_G(a, b) = l e n _G(Π), that is a minimal path in G between a and b. If $\exists i \in [0: k - 1]$ such as w(a _i, a _i+1) = 0, then len_G(Π) is undefined and therefore it is not a minimal path. Since a minimal path exists (due to the definition of a connected graph), $\forall i \in [0: k - 1]$, w(a _i, a _i+1) > 0.

Since $\forall i \in [0: k - 1]$, w _n(a _i, a _i+1) ≠ 0, then $f(x_{0},\ldots,x_{k-1}) =\sum _{i\in [0:k-1]} \frac{1} {x_{i}}$ is continuous in (w(a ₀, a ₁), …, w(a _k−1, a _k)). Therefore, $\text{len}_{G_{n}}(\varPi ) \rightarrow len_{G}(\varPi )$.

Since $\text{dist}_{G_{n}}(a,b) \leq \text{len}_{G_{n}}(\varPi )$,

$$\displaystyle\begin{array}{rcl} \limsup \limits _{n\rightarrow +\infty }\text{dist}_{G_{n}}(a,b)& \leq & \limsup \limits _{n\rightarrow +\infty }\text{len}_{G_{n}}(\varPi ) {}\\ & =& \lim \limits _{n\rightarrow +\infty }\text{len}_{G_{n}}(\varPi ) = \text{len}_{G}(\varPi ) = \text{dist}_{G}(a,b) {}\\ \limsup \limits _{n\rightarrow +\infty }\text{dist}_{G_{n}}(a,b)& \leq & \text{dist}_{G}(a,b) {}\\ \end{array}$$

$\liminf \limits _{n\rightarrow +\infty }\text{dist}_{G_{n}}(a,b) \leq \limsup \limits _{n\rightarrow +\infty }\text{dist}_{G_{n}}(a,b)$, therefore

$$\displaystyle{ \liminf \limits _{n\rightarrow +\infty }dist_{G_{n}}(a,b) \leq \limsup \limits _{n\rightarrow +\infty }\text{dist}_{G_{n}}(a,b) \leq \text{dist}_{G}(a,b) }$$

(11)

Let ε > 0. Since $\limsup \limits _{n\rightarrow +\infty }\text{dist}_{G_{n}}(a,b) \leq \text{dist}_{G}(a,b)$, $\exists n_{0} \in \mathbb{N}$ such that $\forall n \geq n_{0}$, $\text{dist}_{G_{n}}(a,b) \leq (1+\epsilon )\text{dist}_{G}(a,b)$.

Let $n_{1} \in \mathbb{N}$ such that $\forall n \geq n_{1}$, $d(G,G') \leq \dfrac{1} {2 \times \text{dist}_{G}(a,b)}$.

By definition, $\forall (i,j) \in V \times V$, $\vert w_{n}(i,j) - w(i,j)\vert \leq \dfrac{1} {2 \times \text{dist}_{G}(a,b)}$.

For $n \in \mathbb{N}$, let Π _n be a path such as $\text{dist}_{G_{n}}(a,b) = len_{G_{n}}(\varPi _{n})$. If Π _n is not a path in G for n ≥ m a x(n ₀, n ₁), then for $\varPi _{n} = (a_{0}^{(n)},\ldots,a_{k_{n}}^{(n)})$, $\exists i \in [0: k_{n} - 1]$ such as w _n(a _i ⁽ⁿ⁾, a _i+1 ⁽ⁿ⁾) ≠ 0 and w(a _i ⁽ⁿ⁾, a _i+1 ⁽ⁿ⁾) = 0. In that case,

$$\displaystyle{\vert w_{n}(a_{i}^{(n)},a_{ i+1}^{(n)})\vert = \vert w_{ n}(a_{i}^{(n)},a_{ i}^{(n)}) - w(a_{ i}^{(n)},a_{ i}^{(n)})\vert \leq \dfrac{1} {2 \times \text{dist}_{G}(a,b)}}$$

$$\displaystyle{\Rightarrow \text{len}_{G_{n}}(\varPi _{n}) \geq \dfrac{1} {w_{n}(a_{i}^{(n)},a_{i+1}^{(n)})} \geq 2 \times \text{dist}_{G}(a,b)}$$

However, this is contradictory, since $\text{len}_{G_{n}}(\varPi _{n}) = \text{dist}_{G_{n}}(a,b) \leq (1+\epsilon )\text{dist}_{G}(a,b)$ for n ≥ n ₀. Then, for n ≥ max(n ₀, n ₁), Π _n is also a path in G.

Let $n_{3} \in \mathbb{N}$ such that $\forall n \geq n_{3}$, d(G, G _n) ≤ ε w _min with $w_{\text{min}} = \text{min}_{(u,v)\in V ^{2},w(u,v)\neq 0}w(u,v)$. First, we note that

$$\displaystyle{\forall (u,v) \in V ^{2},w_{ n}(u,v) \geq w(u,v) -\epsilon w_{\text{min}} \geq (1-\epsilon )w_{\text{min}}}$$

Then

$$\displaystyle\begin{array}{rcl} \vert \text{len}_{G_{n}}(\varPi _{n}) -\text{len}_{G}(\varPi _{n})\vert & =& \vert \sum _{i\in [0:k_{n}-1]} \dfrac{1} {w_{n}(a_{i}^{(n)},a_{i+1}^{(n)})} - \dfrac{1} {w(a_{i}^{(n)},a_{i+1}^{(n)})}\vert {}\\ & =& \sum _{i\in [0:k_{n}-1]}\frac{\vert w_{n}(a_{i}^{(n)},a_{i+1}^{(n)}) - w(a_{i}^{(n)},a_{i+1}^{(n)})\vert } {w_{n}(a_{i}^{(n)},a_{i+1}^{(n)}) \times w(a_{i}^{(n)},a_{i+1}^{(n)})} {}\\ & \leq & \sum _{i\in [0:k_{n}-1]} \frac{\epsilon w_{\text{min}}} {w_{n}(a_{i}^{(n)},a_{i+1}^{(n)}) \times w(a_{i}^{(n)},a_{i+1}^{(n)})} {}\\ & \leq & \sum _{i\in [0:k_{n}-1]} \frac{\epsilon w_{\text{min}}} {(1-\epsilon )w_{\text{min}} \times w_{\text{min}}} {}\\ \vert len_{G_{n}}(\varPi _{n}) - len_{G}(\varPi _{n})\vert & \leq & \dfrac{\vert V \vert \epsilon } {w_{\text{min}}(1-\epsilon )} {}\\ \end{array}$$

Since len_G(Π _n) ≥ dist_G(a, b) (it is a path between a and b), then $\text{len}_{G_{n}}(\varPi _{n}) \geq \text{dist}_{G}(a,b) - \dfrac{\vert V \vert \epsilon } {w_{\text{min}}(1-\epsilon )}$. Therefore, $\liminf \limits _{n\rightarrow +\infty }\text{len}_{G_{n}}(\varPi _{n}) \geq \text{dist}_{G}(a,b)$. Combined with Eq. 11:

$$\displaystyle{\limsup \limits _{n\rightarrow +\infty }\text{dist}_{G_{n}}(a,b) \leq \text{dist}_{G}(a,b) \leq \liminf \limits _{n\rightarrow +\infty }\text{len}_{G_{n}}(\varPi _{n})}$$

$$\displaystyle{\Rightarrow \lim \limits _{n\rightarrow +\infty }\text{dist}_{G_{n}}(a,b) = \text{dist}_{G}(a,b)}$$

which proves that the distance between any two nodes in a connected graph is continuous.

End of the proof of Lemma 1

We now prove continuity of the function on unconnected clusters. In order to simplify notations, we directly work on the induced subgraphs, and we extend L to take a graph as an input:

$$\displaystyle\begin{array}{rcl} L(G)& =& \left \{\begin{array}{l l} 0 &\quad \mbox{if $\vert \mathrm{V} \vert$ = 1\ or\ G disconnected} \\ \dfrac{\sum _{(u,v)\in V ^{2}}w(u,v)} {\text{diam}(G)} &\quad \text{otherwise} \end{array} \right.{}\\ \end{array}$$

From Lemma 1, we know that dist(u, v) is continuous for all connected graphs, and dist(u, v) > 0. The maximum of continuous functions is continuous, which means that diam(G) is continuous for all connected graphs and diam(G) > 0. The combination of continuous functions is continuous, and 1∕x is continuous on $\mathbb{R}^{+}$. We conclude that L(G) is continuous on all connected graphs.

We now prove that L(G) is continuous on unconnected graphs. Just as in Lemma 1, we take a Cauchy sequence of graphs $(G_{n})_{n\in \mathbb{N}}: G_{n} = (V,w_{n}) \rightarrow G = (V,w)$, but with G disconnected. For all $n \in \mathbb{N}$, if G _n is disconnected, L(G _n) = 0 = L(G).

Since G is disconnected, $\exists (u,v) \in V \times V$, for all paths π ∈ paths(u, v) between u and v, $\exists i \in [0: k - 1]$ such that w(a _i, a _i+1) = 0. If G _n is not disconnected, there exists a minimal path Π _n = (a ₀ = u, a ₁, …, a _k = v) ∈ paths(u, v) (len(Π _n) = diam(G _n)) such that $\forall i \in [0: k - 1]$, w _n(a _i ⁽ⁿ⁾, a _i+1 ⁽ⁿ⁾) > 0. By definition, $\text{len}(\varPi _{n}) =\sum _{i\in [0:k-1]} \dfrac{1} {w_{n}(a_{i}^{(n)},a_{i+1}^{(n)})} > \dfrac{1} {w_{\text{min}}^{(n)}}$ where w _min ⁽ⁿ⁾ = min_{i ∈ [0: k−1]}(w _n(a _i ⁽ⁿ⁾, a _i+1 ⁽ⁿ⁾)). Since G _n converges to G, and G disconnected, $\lim \limits _{n\rightarrow +\infty }w_{min}^{(n)} = 0^{+}$. Therefore, $\lim \limits _{n\rightarrow +\infty }\text{dist}_{G_{n}}(u,v) =\lim \limits _{n\rightarrow +\infty }\min _{\pi \in \text{paths}(u,v)}\text{len}_{G_{n}}(\pi ) = +\infty $.

Since the diameter is the maximum of the distances between all pairs of nodes, $\lim \limits _{n\rightarrow +\infty }\text{diam}(G_{n}) = +\infty $. By definition of the Cauchy sequence, $\lim \limits _{n\rightarrow +\infty }\sum _{(u,v)\in V ^{2}}w_{n}(u,v) =\sum _{(u,v)\in V ^{2}}w(u,v)$. Therefore,

$$\displaystyle{\lim \limits _{n\rightarrow +\infty }L(G_{n}) =\sum _{(u,v)\in V ^{2}}w_{n}(u,v)/\text{diam}(G_{n}) = 0 = L(G)}$$

which implies that for all disconnected graph G, L is continuous on G.

Since compactness is the sum of L(G) applied to subgraphs induced by the clustering, compactness is continuous.

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Creusefond, J., Largillier, T., Peyronnet, S. (2017). A LexDFS-Based Approach on Finding Compact Communities. In: Kaya, M., Erdoǧan, Ö., Rokne, J. (eds) From Social Data Mining and Analysis to Prediction and Community Detection. Lecture Notes in Social Networks. Springer, Cham. https://doi.org/10.1007/978-3-319-51367-6_7

Download citation

DOI: https://doi.org/10.1007/978-3-319-51367-6_7
Published: 22 March 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-51366-9
Online ISBN: 978-3-319-51367-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A LexDFS-Based Approach on Finding Compact Communities

Abstract

Access this chapter

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Appendix: Proofs of Axioms Compliance

1 Appendix: Proofs of Axioms Compliance

1.1 1.1 Specific Notations

1.2 1.2 Permutation Invariance

Definition 1.

Proposition 1.

Proof.

1.3 1.3 Scale Invariance

Definition 2.

Proposition 2.

Proof.

1.4 1.4 Richness

Definition 3.

Proposition 3.

Proof.

1.5 1.5 Monotonicity

Definition 4.

Definition 5.

Proposition 4.

Proof.

1.6 1.6 Locality

Definition 6.

Definition 7.

Proposition 5.

Proof.

1.7 1.7 Continuity

Definition 8.

Proposition 6.

Proof.

Lemma 1.

Proof.

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation