Skip to main content

Clustering Blockchain Data

  • Chapter
  • First Online:
Clustering Methods for Big Data Analytics

Part of the book series: Unsupervised and Semi-Supervised Learning ((UNSESUL))

Abstract

Blockchain datasets, such as those generated by popular cryptocurrencies Bitcoin, Ethereum, and others, are intriguing examples of big data. Analysis of these datasets has diverse applications, such as detecting fraud and illegal transactions, characterizing major services, identifying financial hotspots, and characterizing usage and performance characteristics of large peer-to-peer consensus-based systems. Unsupervised learning methods in general, and clustering methods in particular, hold the potential to discover unanticipated patterns leading to valuable insights. However, the volume, velocity, and variety of blockchain data, as well as the difficulties in evaluating results, pose significant challenges to the efficient and effective application of clustering methods to blockchain data. Nevertheless, recent and ongoing work has adapted classic methods, as well as developed new methods tailored to the characteristics of such data. This chapter motivates the study of clustering methods for blockchain data, and introduces the key blockchain concepts from a data-centric perspective. It presents different models and methods used for clustering blockchain data, and describes the challenges and some solutions to the problem of evaluating such methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. M. Ankerst, M.M. Breunig, H.P. Kriegel, J. Sander, Optics: ordering points to identify the clustering structure, in: Proceedings of the 1999 ACM SIGMOD International Conference on Management of Data, SIGMOD’99. ACM, New York (1999), pp. 49–60. https://doi.org/10.1145/304182.304187

  2. M.K. Awan, A. Cortesi, Blockchain transaction analysis using dominant sets, in Computer Information Systems and Industrial Management, ed. by K. Saeed, W. Homenda, R. Chaki. Springer, Cham (2017), pp. 229–239

    Chapter  Google Scholar 

  3. L. Backstrom, C. Dwork, J. Kleinberg, Wherefore art thou R3579X? Anonymized social networks, hidden patterns, and structural steganography, in Proceedings of the 16th International World Wide Web Conference (2007)

    Google Scholar 

  4. G. Becker, Merkle signature schemes,Merkle trees and their cryptanalysis. Ruhr-Universität Bochum (2008)

    Google Scholar 

  5. Bitcoin price—time series—daily (2018). https://docs.google.com/spreadsheets/d/1cdP-AArCNUB9jS8hEYFFC1qxp4DMEpBCvvC5yuopD68/

  6. Bitcoin Genesis Block, Blockchain.info Blockchain Explorer (2009). https://blockchain.info/tx/4a5e1e4baab89f3a32518a88c31bc87f618f76673e2cc77ab2127b7afdeda33b

  7. Blockchain Luxembourg S.A., Address tags. Bitcoin address tags database (2018). https://blockchain.info/tags

  8. Blockchain Luxembourg S.A., Blockchain explorer (2018). https://blockchain.info/

  9. J. Bondy, U. Murty, Graph Theory (Springer, London, 2008)

    Book  Google Scholar 

  10. J. Bonneau, A. Miller, J. Clark, A. Narayanan, J.A. Kroll, E.W. Felten, SoK: research perspectives and challenges for Bitcoin and cryptocurrencies, in Proceedings of the 36th IEEE Symposium on Security and Privacy, San Jose, California (2015), pp. 104–121

    Google Scholar 

  11. V. Buterin, et al., Ethereum whitepaper (2013). https://github.com/ethereum/wiki/wiki/White-Paper

  12. Chainanalysis, Inc., Chainanalysis reactor (2018). https://www.chainalysis.com/

  13. CoinMarketCap, Historical data for Bitcoin (2018). https://coinmarketcap.com/currencies/bitcoin/historical-data/

  14. K. Collins, Inside the digital heist that terrorized the world—and only made $100k. Quartz (2017). https://qz.com/985093/inside-the-digital-heist-that-terrorized-the-world-and-made-less-than-100k/

  15. J.A. Cuesta-Albertos, A. Gordaliza, C. Matran, Trimmed k-means: an attempt to robustify quantizers. Ann. Stat. 25(2), 553–576 (1997)

    Article  MathSciNet  Google Scholar 

  16. D. Di Francesco Maesa, A. Marino, L. Ricci, Uncovering the Bitcoin blockchain: an analysis of the full users graph, in 2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA) (2016), pp. 537–546. https://doi.org/10.1109/DSAA.2016.52

  17. C. Ding, X. He, K-means clustering via principal component analysis, in Proceedings of the Twenty-first International Conference on Machine Learning, ICML’04 (ACM, Banff, 2004), p. 29. https://doi.org/10.1145/1015330.1015408

    Google Scholar 

  18. R. Dubes, A.K. Jain, Validity studies in clustering methodologies. Pattern Recogn. 11, 235–254 (1979)

    Article  Google Scholar 

  19. A. Epishkina, S. Zapechnikov, Discovering and clustering hidden time patterns in blockchain ledger, in First International Early Research Career Enhancement School on Biologically Inspired Cognitive Architectures (2017)

    Google Scholar 

  20. D. Ermilov, M. Panov, Y. Yanovich, Automatic Bitcoin address clustering, in Proceedings of the 16th IEEE International Conference on Machine Learning and Applications (ICMLA), Cancun, Mexico (2017)

    Google Scholar 

  21. T. Fawcett, ROC graphs: notes and practical considerations for researchers. Pattern Recogn. Lett. 27(8), 882–891 (2004)

    Article  Google Scholar 

  22. M. Fleder, M.S. Kester, S. Pillai, Bitcoin transaction graph analysis. CoRR (2015). abs/1502.01657

    Google Scholar 

  23. B. Fung, Bitcoin got a big boost in 2017. Here are 5 other cryptocurrencies to watch in 2018. Washington Post—Blogs (2018)

    Google Scholar 

  24. J. Gan, Y. Tao, Dbscan revisited: mis-claim, un-fixability, and approximation, in Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, SIGMOD’15 (ACM, New York, 2015), pp. 519–530. https://doi.org/10.1145/2723372.2737792

    Google Scholar 

  25. Z. Ghahramani, Unsupervised learning, in Advanced Lectures on Machine Learning, ed. by O. Bousquet, U. von Luxburg, G. Rätsch. Lecture Notes in Computer Science, vol. 3176, chap. 5 (Springer, Berlin, 2004), pp. 72–112

    Google Scholar 

  26. A. Gunawan, A faster algorithm for DBSCAN. Master’s Thesis, Technical University of Eindhoven (2013)

    Google Scholar 

  27. M. Harrigan, C. Fretter, The unreasonable effectiveness of address clustering, in International IEEE Conferences on Ubiquitous Intelligence and Computing, Advanced and Trusted Computing, Scalable Computing and Communications, Cloud and Big Data Computing, Internet of People, and Smart World Congress (UIC/ATC/ScalCom/CBDCom/IoP/SmartWorld) (2016), pp. 368–373. https://doi.org/10.1109/UIC-ATC-ScalCom-CBDCom-IoP-SmartWorld.2016.0071

  28. Y. He, H. Tan, W. Luo, S. Feng, J. Fan, MR-DBSCAN: a scalable MapReduce-based DBSCAN algorithm for heavily skewed data. Front. Comp. Sci. 8(1), 83–99 (2014)

    Article  MathSciNet  Google Scholar 

  29. B. Huang, Z. Liu, J. Chen, A. Liu, Q. Liu, Q. He, Behavior pattern clustering in blockchain networks. Multimed. Tools Appl. 76(19), 20099–20110 (2017). https://doi.org/10.1007/s11042-017-4396-4

    Article  Google Scholar 

  30. A.K. Jain, M.N. Murty, P.J. Flynn, Data clustering: a review. ACM Comput. Surv. 31(3), 264–323 (1999). https://doi.org/10.1145/331499.331504.

    Article  Google Scholar 

  31. A. Janda, WalletExplorer.com: smart Bitcoin block explorer (2018). Bitcoin block explorer with address grouping and wallet labeling

    Google Scholar 

  32. D. Kaminsky, Black ops of TCP/IPi. Presentation slides (2011). http://dankaminsky.com/2011/08/05/bo2k11/

  33. T. Kohonen, Essentials of the self-organizing map. Neural Netw. 37, 52–65 (2013). https://doi.org/10.1016/j.neunet.2012.09.018. Twenty-fifth Anniversary Commemorative Issue

    Article  Google Scholar 

  34. H. Kuzuno, C. Karam, Blockchain explorer: an analytical process and investigation environment for Bitcoin, in Proceedings of the APWG Symposium on Electronic Crime Research (eCrime) (2017), pp. 9–16. https://doi.org/10.1109/ECRIME.2017.7945049

  35. P.C. Mahalanobis, On the generalised distance in statistics. Proc. Natl. Inst. Sci. India 2(1), 49–55 (1936)

    MathSciNet  MATH  Google Scholar 

  36. S.T. Mai, I. Assent, M. Storgaard, AnyDBC: an efficient anytime density-based clustering algorithm for very large complex datasets, in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD’16 (ACM, New York, 2016), pp. 1025–1034. https://doi.org/10.1145/2939672.2939750

    Google Scholar 

  37. J. McCaffrey, Data clustering using entropy minimization. Visual Studio Magazine (2013)

    Google Scholar 

  38. S. Meiklejohn, M. Pomarole, G. Jordan, K. Levchenko, D. McCoy, G.M. Voelker, S. Savage, A fistful of Bitcoins: characterizing payments among men with no names, in Proceedings of the Conference on Internet Measurement, IMC’13, (ACM, Barcelona, 2013), pp. 127–140. https://doi.org/10.1145/2504730.2504747

    Google Scholar 

  39. R.C. Merkle, A digital signature based on a conventional encryption function, in Advances in Cryptology—CRYPTO’87, ed. by C. Pomerance (Springer, Berlin, 1988), pp. 369–378

    Google Scholar 

  40. I. Miers, C. Garman, M. Green, A.D. Rubin, Zerocoin: anonymous distributed e-cash from Bitcoin, in Proceedings of the IEEE Symposium on Security and Privacy (2013)

    Google Scholar 

  41. P. Monamo, V. Marivate, B. Twala, Unsupervised learning for robust Bitcoin fraud detection, in Proceedings of the 2016 Information Security for South Africa (ISSA 2016) Conference, Johannesburg, South Africa (2016), pp. 129–134

    Google Scholar 

  42. C.M. Nachiappan, P. Pattanayak, S. Verma, V. Kalyanaraman, Blockchain technology: beyond Bitcoin. Technical Report, Sutardja Center for Entrepreneurship & Technology, University of California, Berkeley (2015)

    Google Scholar 

  43. S. Nakamoto, Bitcoin: a peer-to-peer electronic cash system. Pseudonymous posting (2008). Archived at https://bitcoin.org/en/bitcoin-paper

  44. R. Norvill, B.B.F. Pontiveros, R. State, I. Awan, A. Cullen, Automated labeling of unknown contracts in ethereum, in Proceedings of the 26th International Conference on Computer Communication and Networks (ICCCN), (2017), pp. 1–6. https://doi.org/10.1109/ICCCN.2017.8038513

  45. M. Ober, S. Katzenbeisser, K. Hamacher, Structure and anonymity of the Bitcoin transaction graph. Future Internet 5(2), 237–250 (2013). https://doi.org/10.3390/fi5020237, http://www.mdpi.com/1999-5903/5/2/237

    Article  Google Scholar 

  46. M.S. Ortega, The Bitcoin transaction graph—anonymity. Master’s Thesis, Universitat Oberta de Catalunya, Barcelona (2013)

    Google Scholar 

  47. V.C. Osamor, E.F. Adebiyi, J.O. Oyelade, S. Doumbia, Reducing the time requirement of k-means algorithm. PLoS One 7(12), 1–10 (2012). https://doi.org/10.1371/journal.pone.0049946

    Article  Google Scholar 

  48. S. Patel, Blockchains for publicizing available scientific datasets. Master’s Thesis, The University of Mississippi (2017)

    Google Scholar 

  49. T. Pham, S. Lee, Anomaly detection in Bitcoin network using unsupervised learning methods (2017). arXiv:1611.03941v1 [cs.LG] https://arxiv.org/abs/1611.03941v1

  50. S. Pongnumkul, C. Siripanpornchana, S. Thajchayapong, Performance analysis of private blockchain platforms in varying workloads, in Proceedings of the 26th International Conference on Computer Communication and Networks (ICCCN) (2017), pp. 1–6. https://doi.org/10.1109/ICCCN.2017.8038517

  51. B. Raskutti, C. Leckie, An evaluation of criteria for measuring the quality of clusters. in Proceedings of the 16th International Joint Conference on Artificial Intelligence—Volume 2, IJCAI’99. Stockholm, Sweden (1999), pp. 905–910. http://dl.acm.org/citation.cfm?id=1624312.1624348

  52. S. Raval, Decentralized applications: harnessing Bitcoin’s blockchain technology. O’Reilly Media (2016). ISBN-13: 978-1-4919-2454-9

    Google Scholar 

  53. F. Reid, M. Harrigan, An analysis of anonymity in the Bitcoin system (2012). arXiv:1107.4524v2 [physics.soc-ph]. https://arxiv.org/abs/1107.4524

  54. E. Schubert, A. Koos, T. Emrich, A. Züfle, K.A. Schmid, A. Zimek, A framework for clustering uncertain data. Proc. VLDB Endow. 8(12), 1976–1979 (2015). https://doi.org/10.14778/2824032.2824115

    Article  Google Scholar 

  55. E. Schubert, J. Sander, M. Ester, H.P. Kriegel, X. Xu, DBSCAN revisited, revisited: why and how you should (still) use DBSCAN. ACM Trans. Database Syst. 42(3), 19:1–19:21 (2017). https://doi.org/10.1145/3068335

    Article  Google Scholar 

  56. D.J. Watts, S.H. Strogatz, Collective dynamics of ‘small-world’ networks. Nature 393, 440–442 (1998)

    Article  Google Scholar 

  57. What is Bitcoin vanity address? (2017). http://bitcoinvanitygen.com/

  58. H. Xiong, J. Wu, J. Chen, K-means clustering versus validation measures: A data distribution perspective, in Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD’06, Philadelphia, PA, USA (2006), pp. 779–784. https://doi.org/10.1145/1150402.1150503

    Google Scholar 

  59. X. Xu, N. Yuruk, Z. Feng, T.A.J. Schweiger, Scan: a structural clustering algorithm for networks, in Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD’07 (ACM, New York, 2007), pp. 824–833. https://doi.org/10.1145/1281192.1281280

    Google Scholar 

  60. Y. Yanovich, P. Mischenko, A. Ostrovskiy, Shared send untangling in Bitcoin. The Bitfury Group white paper (2016) (Version 1.0)

    Google Scholar 

  61. J. Yli-Huumo, D. Ko, S. Choi, S. Park, K. Smolander, Where is current research on blockchain technology?—a systematic review. PLoS One 11(10), e0163477 (2016). https://doi.org/10.1371/journal.pone.0163477

    Article  Google Scholar 

  62. D. Zhang, S. Chen, Z.H. Zhou, Entropy-inspired competitive clustering algorithms. Int. J. Softw. Inform. 1(1), 67–84 (2007)

    Google Scholar 

  63. A. Zimek, E. Schubert, H.P. Kriegel, A survey on unsupervised outlier detection in high-dimensional numerical data. Stat. Anal. Data Min. ASA Data Sci. J. 5(5), 363–387 (2012). https://doi.org/10.1002/sam.11161

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

This work was supported in part by the US National Science Foundation grants EAR-1027960 and PLR-1142007. Several improvements resulted from detailed feedback from the reviewers.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sudarshan S. Chawathe .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Chawathe, S.S. (2019). Clustering Blockchain Data. In: Nasraoui, O., Ben N'Cir, CE. (eds) Clustering Methods for Big Data Analytics. Unsupervised and Semi-Supervised Learning. Springer, Cham. https://doi.org/10.1007/978-3-319-97864-2_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-97864-2_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-97863-5

  • Online ISBN: 978-3-319-97864-2

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics