Skip to main content

Gollector: Measuring Domain Name Dark Matter from Different Vantage Points

  • Conference paper
  • First Online:
Secure IT Systems (NordSec 2021)

Abstract

This paper proposes Gollector, a novel tool for measuring the domain name space from different vantage points. Whereas such measurements have typically been conducted from a single (or few) vantage point, our proposed solution combines multiple measurements in a single system. Gollector allows us to express the relative difference in the covered domain name space, and the temporal characteristics, as domain name dark matter. We leverage a three-week trace from four vantage points, by applying the tool to three security-related use cases: early domain registration detection, data leakage in a split-horizon situation, and a proposed method for subdomain enumeration. We release the Gollector source code to the research community to support future research in this field.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 54.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 69.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    as of July 2021.

  2. 2.

    The difference between data collected from a routing device from a network operator and a DNS resolver may be insignificant if both vantage points are owned by the same party, in the case of an ISP.

  3. 3.

    using SHA256.

  4. 4.

    https://github.com/google/certificate-transparency-community-site/blob/master/docs/google/known-logs.md.

  5. 5.

    Between February 1st, 2021 and February 21st, 2021.

  6. 6.

    Registries will have access to more accurate registration data than just the zone files, so this is a limitation for researchers who only have access to the zone files.

  7. 7.

    Since a registration is detected by computing the difference of the zone files of two subsequent days, we are missing the registrations on the first day of our measurement.

  8. 8.

    There are potentially many clique covers, and our purpose is not to achieve a minimal clique cover.

References

  1. Comodo SSL affiliate the recent RA compromise. https://blog.comodo.com/other/the-recent-ra-compromise/. Accessed 23 July 2021

  2. DNSdumpster. https://dnsdumpster.com/. Accessed 10 July 2021

  3. DSNRecon. https://github.com/darkoperator/dnsrecon. Accessed 10 July 2021

  4. Magento. https://magento.com/. Accessed 27 July 2021

  5. OWASP/Amass. https://github.com/OWASP/Amass. Accessed 10 July 2021

  6. Subfinder. https://github.com/projectdiscovery/subfinder. Accessed 10 July 2021

  7. Sublist3r. https://github.com/aboul3la/Sublist3r. Accessed 10 July 2021

  8. The most popular subdomains on the internet (2016). https://bitquark.co.uk/blog/2016/02/29/the_most_popular_subdomains_on_the_internet. Accessed 27 July 2021

  9. About Splunk stream (2020). https://docs.splunk.com/Documentation/StreamApp/7.3.0/DeployStreamApp/AboutSplunkStream. Accessed 10 July 2021

  10. Openintel - current coverage (2020). https://openintel.nl/coverage/. Accessed 10 July 2021

  11. Using GeoIP with BIND 9 (2020). https://kb.isc.org/docs/aa-01149. Accessed 10 July 2021

  12. About zone file access (2021). https://www.icann.org/resources/pages/zfa-2013-06-28-en. Accessed 30 Aug 2021

  13. Centralized zone data service (2021). https://czds.icann.org/. Accessed 30 Aug 2021

  14. List of top-level domains (2021). https://www.icann.org/resources/pages/tlds-2012-02-25-en. Accessed 30 Aug 2021

  15. Project sonar (2021). https://opendata.rapid7.com/about/. Accessed 10 July 2021

  16. Public suffix list (2021). https://publicsuffix.org/. Accessed 10 July 2021

  17. value (2021). https://documentation.cpanel.net/display/CKB/Service+Subdomains+Explanation. Accessed 30 Aug 2021

  18. van Adrichem, N.L.M., et al.: A measurement study of DNSSEC misconfigurations. Secur. Inform. 4(1) (2015). https://doi.org/10.1186/s13388-015-0023-y

  19. Aitchison, R.: DNS techniques, pp. 163–207. Apress, Berkeley (2011). https://doi.org/10.1007/978-1-4302-3049-6_8

  20. Alieyan, K., Almomani, A., Manasrah, A., Kadhum, M.M.: A survey of botnet detection based on DNS. Neural Comput. Appl. 28(7), 1541–1558 (2017). https://doi.org/10.1007/s00521-015-2128-0

    Article  Google Scholar 

  21. Behjat, A.: ISC spins off its security business unit (2013). https://www.isc.org/blogs/isc-spins-off-its-security-business-unit/

  22. Bharath: A penetration tester’s guide to subdomain enumeration (2018). https://blog.appsecco.com/a-penetration-testers-guide-to-sub-domain-enumeration-7d842d5570f6. Accessed 24 July 2021

  23. Borges, E.: Wrong Bind configuration exposes the complete list of Russian TLD’s to the Internet, March 2018. https://securitytrails.com/blog/russian-tlds. Accessed 30 Aug 2021

  24. Eastlake, D., Panitz, A.: Reserved Top Level DNS Names, RFC ed. BCP 32, June 1999

    Google Scholar 

  25. Edmonds, R.: ISC passive DNS architecture (2012). https://mirror.yongbok.net/isc/kb-files/passive-dns-architecture.pdf

  26. Hao, S., Kantchelian, A., Miller, B., Paxson, V., Feamster, N.: PREDATOR: proactive recognition and elimination of domain abuse at time-of-registration. In: Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, CCS 2016, pp. 1568–1579. Association for Computing Machinery, New York (2016). https://doi.org/10.1145/2976749.2978317

  27. Hohlfeld, O.: Operating a DNS-based active internet observatory. In: Proceedings of the ACM SIGCOMM 2018 Conference on Posters and Demos, SIGCOMM 2018, pp. 60–62. Association for Computing Machinery, New York (2018). https://doi.org/10.1145/3234200.3234239

  28. Jaccard, P.: Distribution de la flore alpine dans le bassin des dranses et dans quelques régions voisines. Bull. Soc. Vaudoise. Sci. Nat. 37, 241–272 (1901)

    Google Scholar 

  29. Laurie, B., Langley, A., Kasper, E.: Certificate Transparency, RFC ed. RFC 6962, June 2013

    Google Scholar 

  30. Mockapetris, P.: Domain Names - Implementation and Specification, RFC ed. STD 13, November 1987. http://www.rfc-editor.org/rfc/rfc1035.txt

  31. Pearce, P., et al.: Global measurement of DNS manipulation. In: 26th USENIX Security Symposium (USENIX Security 2017), pp. 307–323. USENIX Association, Vancouver, August 2017. https://www.usenix.org/conference/usenixsecurity17/technical-sessions/presentation/pearce

  32. Prins, J.: DigiNotar certificate authority breach “operation black tulip” (2011). https://media.threatpost.com/wp-content/uploads/sites/103/2011/09/07061400/rapport-fox-it-operation-black-tulip-v1-0.pdf. Accessed 23 July 2021

  33. Rescorla, E.: The Transport Layer Security (TLS) Protocol Version 1.3, RFC ed. RFC 8446, August 2018

    Google Scholar 

  34. van Rijswijk-Deij, R., Jonker, M., Sperotto, A., Pras, A.: A high-performance, scalable infrastructure for large-scale active DNS measurements. IEEE J. Sel. Areas Commun. 34(6), 1877–1888 (2016). https://doi.org/10.1109/JSAC.2016.2558918

    Article  Google Scholar 

  35. Schlyter, J.: DNS Security (DNSSEC) NextSECure (NSEC) RDATA Format, RFC ed. RFC 3845, August 2004

    Google Scholar 

  36. Singh, M., Singh, M., Kaur, S.: Issues and challenges in DNS based botnet detection: a survey. Comput. Secur. 86, 28–52 (2019). https://doi.org/10.1016/j.cose.2019.05.019. https://www.sciencedirect.com/science/article/pii/S0167404819301117

  37. Szurdi, J., Kocso, B., Cseh, G., Spring, J., Felegyhazi, M., Kanich, C.: The long “taile” of typosquatting domain names. In: 23rd USENIX Security Symposium (USENIX Security 2014), pp. 191–206. USENIX Association, San Diego, August 2014. https://www.usenix.org/conference/usenixsecurity14/technical-sessions/presentation/szurdi

  38. van der Toorn, O., van Rijswijk-Deij, R., Geesink, B., Sperotto, A.: Melting the snow: using active DNS measurements to detect snowshoe spam domains. In: NOMS 2018–2018 IEEE/IFIP Network Operations and Management Symposium, pp. 1–9 (2018). https://doi.org/10.1109/NOMS.2018.8406222

  39. VanderSloot, B., Amann, J., Bernhard, M., Durumeric, Z., Bailey, M., Halderman, J.A.: Towards a complete view of the certificate ecosystem. In: Proceedings of the 2016 Internet Measurement Conference, IMC 2016, pp. 543–549. Association for Computing Machinery, New York (2016). https://doi.org/10.1145/2987443.2987462

  40. Weimer, F.: Passive DNS replication. In: FIRST Conference on Computer Security Incident (2005)

    Google Scholar 

  41. Wullink, M., Moura, G.C.M., Müller, M., Hesselman, C.: ENTRADA: a high-performance network traffic data streaming warehouse. In: NOMS 2016–2016 IEEE/IFIP Network Operations and Management Symposium, pp. 913–918 (2016). https://doi.org/10.1109/NOMS.2016.7502925

  42. Wullink, M., Muller, M., Davids, M., Moura, G.C.M., Hesselman, C.: ENTRADA: enabling DNS big data applications. In: 2016 APWG Symposium on Electronic Crime Research (eCrime), pp. 1–11 (2016). https://doi.org/10.1109/ECRIME.2016.7487939

Download references

Acknowledgments

This research was carried out under the SecDNS project, funded by Innovation Fund Denmark. We would like to express our gratitude to Finn Büttner and Erwin Lansing for their assistance in collecting our passive DNS datasets.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kaspar Hageman .

Editor information

Editors and Affiliations

Appendices

Appendix A Clique Cover Algorithm

Algorithm 1 denotes the algorithm used to compute a clique cover for graph G. The intuition behind the algorithm is that two nodes – connected through an edge with the largest weight – have the largest priority to form a clique. The algorithm iterates over all edges in the graph and assigns a clique to each node in the graph based on the interactions that are observed through the edges. Depending on whether the source and destination nodes of the edge are already in a clique, the algorithm creates new cliques, adds nodes to existing cliques, or merges cliques. The output of the algorithm is a hashmap of the clique assigned to each node in the graph. The implementation of the algorithm includes several optimizations to reduce the edges to evaluate.

figure b

Appendix B Examples of Cliques

Table 6 contains several examples of cliques. The table shows a general description of what the subdomains may be intended for, the number of subdomains in the clique, the number of apexes associated with these subdomains, and the list of subdomains comprised by the clique.

Table 6. Examples of cliques

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Hageman, K., Hansen, R.R., Pedersen, J.M. (2021). Gollector: Measuring Domain Name Dark Matter from Different Vantage Points. In: Tuveri, N., Michalas, A., Brumley, B.B. (eds) Secure IT Systems. NordSec 2021. Lecture Notes in Computer Science(), vol 13115. Springer, Cham. https://doi.org/10.1007/978-3-030-91625-1_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-91625-1_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-91624-4

  • Online ISBN: 978-3-030-91625-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics