Skip to main content

SybilBlind: Detecting Fake Users in Online Social Networks Without Manual Labels

  • Conference paper
  • First Online:
Research in Attacks, Intrusions, and Defenses (RAID 2018)

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 11050))

Abstract

Detecting fake users (also called Sybils) in online social networks is a basic security research problem. State-of-the-art approaches rely on a large amount of manually labeled users as a training set. These approaches suffer from three key limitations: (1) it is time-consuming and costly to manually label a large training set, (2) they cannot detect new Sybils in a timely fashion, and (3) they are vulnerable to Sybil attacks that leverage information of the training set. In this work, we propose SybilBlind, a structure-based Sybil detection framework that does not rely on a manually labeled training set. SybilBlind works under the same threat model as state-of-the-art structure-based methods. We demonstrate the effectiveness of SybilBlind using (1) a social network with synthetic Sybils and (2) two Twitter datasets with real Sybils. For instance, SybilBlind achieves an AUC of 0.98 on a Twitter dataset.

B. Wang and L. Zhang—Authors contributed equally to this work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Our framework can also be generalized to directed social networks.

  2. 2.

    The local community detection method [26] requires labeled benign nodes and thus is inapplicable to detect Sybils without a manually labeled training set.

  3. 3.

    http://home.engineering.iastate.edu/~neilgong/dataset.html.

  4. 4.

    https://sites.google.com/site/findcommunities/.

References

  1. 1 in 10 Twitter accounts is fake. http://goo.gl/qTYbyy

  2. Alvisi, L., Clement, A., Epasto, A., Lattanzi, S., Panconesi, A.: SoK: the evolution of sybil defense via social networks. In: IEEE S & P (2013)

    Google Scholar 

  3. Barabási, A., Albert, R.: Emergence of scaling in random networks. Science 286, 509–512 (1999)

    Article  MathSciNet  Google Scholar 

  4. Benevenuto, F., Magno, G., Rodrigues, T., Almeida, V.: Detecting spammers on Twitter. In: CEAS (2010)

    Google Scholar 

  5. Blondel, V.D., Guillaume, J.L., Lambiotte, R., Lefebvre, E.: Fast unfolding of communities in large networks. Stat. Mech.: Theory Exp. (2008)

    Google Scholar 

  6. Boshmaf, Y., Logothetis, D., Siganos, G., Leria, J., Lorenzo, J.: Integro: leveraging victim prediction for robust fake account detection in OSNs. In: NDSS (2015)

    Google Scholar 

  7. Cao, Q., Sirivianos, M., Yang, X., Pregueiro, T.: Aiding the detection of fake accounts in large scale social online services. In: NSDI (2012)

    Google Scholar 

  8. Danezis, G., Mittal, P.: SybilInfer: detecting Sybil nodes using social networks. In: NDSS (2009)

    Google Scholar 

  9. Fu, H., Xie, X., Rui, Y., Gong, N.Z., Sun, G., Chen, E.: Robust spammer detection in microblogs: leveraging user carefulness. ACM Trans. Intell. Syst. Technol. (TIST) (2017)

    Google Scholar 

  10. Gao, H., Chen, Y., Lee, K., Palsetia, D., Choudhary, A.: Towards online spam filtering in social networks. In: NDSS (2012)

    Google Scholar 

  11. Gao, P., Wang, B., Gong, N.Z., Kulkarni, S., Thomas, K., Mittal, P.: SybilFuse: Combining local attributes with global structure to perform robust Sybil detection. In: IEEE CNS (2018)

    Google Scholar 

  12. Ghosh, S., et al.: Understanding and combating link farming in the Twitter social network. In: WWW (2012)

    Google Scholar 

  13. Gilbert, E., Karahalios, K.: Predicting tie strength with social media. In: CHI (2009)

    Google Scholar 

  14. Gong, N.Z., Frank, M., Mittal, P.: SybilBelief: a semi-supervised learning approach for structure-based Sybil detection. IEEE TIFS 9(6), 976–987 (2014)

    Google Scholar 

  15. Hacking Election, May 2016. http://goo.gl/G8o9x0

  16. Hacking Financial Market, May 2016. http://goo.gl/4AkWyt

  17. Hoeffding, W.: Probability inequalities for sums of bounded random variables. J. Am. Stat. Assoc. 58(301), 13–30 (1963)

    Article  MathSciNet  Google Scholar 

  18. Jia, J., Wang, B., Gong, N.Z.: Random walk based fake account detection in online social networks. In: IEEE DSN, pp. 273–284 (2017)

    Google Scholar 

  19. Kontaxis, G., Polakis, I., Ioannidis, S., Markatos, E.P.: Detecting social network profile cloning. In: IEEE PERCOM Workshops (2011)

    Google Scholar 

  20. Kwak, H., Lee, C., Park, H., Moon, S.: What is Twitter, a social network or a news media? In: WWW, pp. 591–600. ACM (2010)

    Google Scholar 

  21. Liu, C., Gao, P., Wright, M., Mittal, P.: Exploiting temporal dynamics in Sybil defenses. In: ACM CCS, pp. 805–816 (2015)

    Google Scholar 

  22. Song, J., Lee, S., Kim, J.: Spam filtering in Twitter using sender-receiver relationship. In: Sommer, R., Balzarotti, D., Maier, G. (eds.) RAID 2011. LNCS, vol. 6961, pp. 301–317. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-23644-0_16

    Chapter  Google Scholar 

  23. Stringhini, G., Kruegel, C., Vigna, G.: Detecting spammers on social networks. In: ACSAC (2010)

    Google Scholar 

  24. Thomas, K., Grier, C., Ma, J., Paxson, V., Song, D.: Design and evaluation of a real-time URL spam filtering service. In: IEEE S & P (2011)

    Google Scholar 

  25. Thomas, K., McCoy, D., Grier, C., Kolcz, A., Paxson, V.: Trafficking fraudulent accounts: the role of the underground market in Twitter spam and abuse. In: USENIX Security Symposium (2013)

    Google Scholar 

  26. Viswanath, B., Post, A., Gummadi, K.P., Mislove, A.: An analysis of social network-based Sybil defenses. In: ACM SIGCOMM (2010)

    Google Scholar 

  27. Wang, A.H.: Don’t follow me - spam detection in Twitter. In: SECRYPT (2010)

    Google Scholar 

  28. Wang, B., Gong, N.Z., Fu, H.: GANG: detecting fraudulent users in online social networks via guilt-by-association on directed graphs. In: IEEE ICDM (2017)

    Google Scholar 

  29. Wang, B., Jia, J., Zhang, L., Gong, N.Z.: Structure-based Sybil detection in social networks via local rule-based propagation. IEEE Transactions on Network Science and Engineering (2018)

    Google Scholar 

  30. Wang, B., Zhang, L., Gong, N.Z.: SybilSCAR: Sybil detection in online social networks via local rule based propagation. In: IEEE INFOCOM (2017)

    Google Scholar 

  31. Wang, G., Konolige, T., Wilson, C., Wang, X.: You are how you click: clickstream analysis for Sybil detection. In: Usenix Security (2013)

    Google Scholar 

  32. Wang, G., et al.: Social turing tests: crowdsourcing Sybil detection. In: NDSS (2013)

    Google Scholar 

  33. Wei, W., Xu, F., Tan, C., Li, Q.: SybilDefender: defend against Sybil attacks in large social networks. In: IEEE INFOCOM (2012)

    Google Scholar 

  34. Wilson, C., Boe, B., Sala, A., Puttaswamy, K.P., Zhao, B.Y.: User interactions in social networks and their implications. In: EuroSys (2009)

    Google Scholar 

  35. Yang, C., Harkreader, R.C., Gu, G.: Die free or live hard? Empirical evaluation and new design for fighting evolving Twitter spammers. In: Sommer, R., Balzarotti, D., Maier, G. (eds.) RAID 2011. LNCS, vol. 6961, pp. 318–337. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-23644-0_17

    Chapter  Google Scholar 

  36. Yang, C., Harkreader, R., Zhang, J., Shin, S., Gu, G.: Analyzing spammer’s social networks for fun and profit. In: WWW (2012)

    Google Scholar 

  37. Yang, Z., Wilson, C., Wang, X., Gao, T., Zhao, B.Y., Dai, Y.: Uncovering social network Sybils in the wild. In: IMC (2011)

    Google Scholar 

  38. Yu, H., Gibbons, P.B., Kaminsky, M., Xiao, F.: SybilLimit: a near-optimal social network defense against Sybil attacks. In: IEEE S & P (2008)

    Google Scholar 

  39. Yu, H., Kaminsky, M., Gibbons, P.B., Flaxman., A.: SybilGuard: defending against Sybil attacks via social networks. In: ACM SIGCOMM (2006)

    Google Scholar 

Download references

Acknowledgements

We thank the anonymous reviewers and our shepherd Jason Polakis for their constructive comments. This work was supported by NSF under grant CNS-1750198 and a research gift from JD.com.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Binghui Wang .

Editor information

Editors and Affiliations

Appendices

A Performance of the Average Aggregator

Theorem 2

When SybilBlind uses the average aggregator, the expected aggregated probability is 0.5 for every node.

Proof

Suppose in some sampling trial, the sampled subsets are B and S, and SybilSCAR halts after T iterations. We denote by \(q_u\) the prior probability and by \({p_u}^{(t)}\) the probability in the tth iteration for u, respectively. Note that the subsets \(B'=S\) and \(S'=B\) are sampled by the sampler with the same probability. We denote by \(q_u'\) the prior probability and by \({p_u}^{(t)'}\) the probability in the tth iteration for u, respectively, when SybilSCAR uses the subsets \(B'\) and \(S'\). We prove that \({q}_u'=1 -{q}_u\) and \({p}_u^{(t)'} = 1-{p}_u^{(t)}\) for every node u and iteration t. First, we have:

$$\begin{aligned} {q}_u' = {\left\{ \begin{array}{ll} 0.5 - \theta &{}= 1-{q}_u \,\,\text { if } u \in S \\ 0.5 + \theta &{}= 1-{q}_u \,\, \text { if } u \in B \\ 0.5 &{}= 1-{q}_u \,\,\text { otherwise,} \end{array}\right. } \end{aligned}$$

which means that \({{q}_u}'=1-{q}_u\) for every node.

We have \({p_u}^{(0)'} = {q_u}'\) and \({p_u}^{(0)} = {q_u}\). Therefore, \({p}_u^{(0)'} =1 -{p}_u^{(0)}\) holds for every node in the 0th iteration. We can also show that \({p}_u^{(t)'} = 1-{p}_u^{(t)}\) holds for every node in the tth iteration if \({p}_u^{(t-1)'} = 1-{p}_u^{(t-1)}\) holds for every node. Therefore, \({p}_u^{(t)'} = 1-{p}_u^{(t)}\) holds for every node u and iteration t. As a result, with the sampled subsets \(B'\) and \(S'\), SybilSCAR also halts after T iterations. Moreover, the average probability in the two sampling trials (i.e., the sampled subsets are B and S, and \(B'=S\) and \(S'=B\)) is 0.5 for every node. For each pair of sampled subsets B and S, there is a pair of subsets \(B'=S\) and \(S'=B\) that are sampled by our sampler with the same probability. Therefore, the expected aggregated probability is 0.5 for every node.

B Proof of Theorem 1

Lower Bound: We have:

$$\begin{aligned} \text {Pr}(\alpha _b \le \tau , \alpha _s \le \tau )&\ge \text {Pr}(\alpha _b=\alpha _s = 0) =(1-r)^n r^n. \end{aligned}$$
(4)

We note that this lower bound is very loose because we simply ignore the cases where \(\text {Pr}(0<\alpha _b \le \tau , 0<\alpha _s \le \tau )\). However, this lower bound is sufficient to give us qualitative understanding.

Upper Bound: We observe that the probability that label noise in both the benign region and the Sybil region are no bigger than \(\tau \) is bounded by the probability that label noise in the benign region or the Sybil region is no bigger than \(\tau \). Formally, we have:

$$\begin{aligned} \text {Pr}(\alpha _b \le \tau , \alpha _s \le \tau ) \le \min \{\text {Pr}(\alpha _b \le \tau ), \text {Pr}( \alpha _s \le \tau ) \} \end{aligned}$$
(5)

Next, we will bound the probabilities \(\text {Pr}(\alpha _b \le \tau )\) and \(\text {Pr}( \alpha _s \le \tau )\) separately. We will take \(\text {Pr}(\alpha _b \le \tau )\) as an example to show the derivations, and similar derivations can be used to bound \(\text {Pr}( \alpha _s \le \tau )\).

We observe the following equivalent equations:

$$\begin{aligned} \text {Pr}(\alpha _b \le \tau )&=\text {Pr}(\frac{n_{sb}}{n_{sb} + n_{bb} } \le \tau ) =\text {Pr}(\tau n_{bb} + (\tau - 1) n_{sb} \ge 0) \end{aligned}$$
(6)

We define n random variables \(X_1, X_2, \cdots , X_n\) and n random variables \(Y_1, Y_2, \cdots , Y_n\) as follows:

$$\begin{aligned}&X_i = {\left\{ \begin{array}{ll} \tau &{}\, \, \, \, \,\,\, \text { if the } i \text {th node in B is benign} \\ 0 &{}\, \, \, \, \,\,\, \text { otherwise} \\ \end{array}\right. } \\&Y_i = {\left\{ \begin{array}{ll} \tau -1 &{}\text { if the } i \text {th node in S is benign} \\ 0 &{}\text { otherwise,} \end{array}\right. } \end{aligned}$$

where \(i=1,2,\cdots , n\). According to our definitions, we have \(\text {Pr}(X_i=\tau )=1 - r\) and \(\text {Pr}(Y_i=\tau - 1)=1 - r\), where \(i=1,2,\cdots , n\). Moreover, we denote S as the sum of these random variables, i.e., \(S=\sum _{i=1}^n X_i + \sum _{i=1}^n Y_i\). Then, the expected value of S is \(E(S)=-(1-2\tau )(1-r)n\). With the variables S and E(S), we can further rewrite Eq. 6 as follows:

$$\begin{aligned} \text {Pr}(\alpha _b \le \tau )= \text {Pr}( S -E(S) \ge -E(S)) \end{aligned}$$

According to Hoeffding’s inequality [17], we have

$$\begin{aligned} \text {Pr}( S -E(S) \ge -E(S))&\le \text {exp}\Big (-\frac{2E^2(s)}{(\tau ^2 + (1-\tau )^2)n}\Big ) =\text {exp}\Big (-\frac{2(1-2\tau )^2(1-r)^2n}{\tau ^2 + (1-\tau )^2}\Big ) \end{aligned}$$

Similarly, we can derive an upper bound of \(Pr( \alpha _s \le \tau )\) as follows:

$$\begin{aligned} \text {Pr}( \alpha _s \le \tau )&\le \text {exp}\Big (-\frac{2(1-2\tau )^2 r^2 n}{\tau ^2 + (1-\tau )^2}\Big ) \end{aligned}$$
(7)

Since we consider \(r<0.5\) in this work, we have:

$$\begin{aligned} \min \{\text {Pr}(\alpha _b \le \tau ), \text {Pr}( \alpha _s \le \tau ) \} = \text {exp}\Big (-\frac{2(1-2\tau )^2(1-r)^2n}{\tau ^2 + (1-\tau )^2}\Big ) \end{aligned}$$
(8)

By combining Eqs. 5 and 8, we obtain Eq. 3.

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wang, B., Zhang, L., Gong, N.Z. (2018). SybilBlind: Detecting Fake Users in Online Social Networks Without Manual Labels. In: Bailey, M., Holz, T., Stamatogiannakis, M., Ioannidis, S. (eds) Research in Attacks, Intrusions, and Defenses. RAID 2018. Lecture Notes in Computer Science(), vol 11050. Springer, Cham. https://doi.org/10.1007/978-3-030-00470-5_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-00470-5_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-00469-9

  • Online ISBN: 978-3-030-00470-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics