Skip to main content
Log in

Profiling phishing activity based on hyperlinks extracted from phishing emails

  • Original Article
  • Published:
Social Network Analysis and Mining Aims and scope Submit manuscript

Abstract

Phishing activity has recently been focused on social networking sites as a more effective way of exploiting not only the technology but also the trust that may exist between members in a social network. In this paper, a novel method for profiling phishing activity from an analysis of phishing emails is proposed. Profiling is useful in determining the activity of an individual or a particular group of phishers. Work in the area of phishing is usually aimed at detection of phishing emails. In this paper, we concentrate on profiling as distinct from detection of phishing emails. We formulate the profiling problem as a multi-label classification problem using the hyperlinks in the phishing emails as features and structural properties of emails along with whois (i.e. DNS) information on hyperlinks as profile classes. Further, we generate profiles based on the classifier predictions. Thus, classes become elements of profiles. We employ a boosting algorithm (AdaBoost) as well as SVM to generate multi-label class predictions on three different datasets created from hyperlink information in phishing emails. These predictions are further utilized to generate complete profiles of these emails. Results show that profiling can be done with quite high accuracy using hyperlink information.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

References

  • Alison L, Smith M, Eastman O, Rainbow L (2003) Toulmin’s philosophy of argument and its relevance to offender profiling. Psychol Crime Law 9(2):173–183

    Article  Google Scholar 

  • Bhattacharyya P, Garg A, Wu SF (2010) Analysis of user keyword similarity in online social networks. Social Netw Anal Min. doi:10.1007/s13278-010-0006-4

  • Brandjacking index (2009) Markmonitor.com, Spring 2009. http://www.markmonitor.com/download/bji/BrandjackingIndex-Spring2009.pdf

  • Castle T, Hensley C (2002) Serial killers with military experience: applying learning theory to serial murder. International J Offender Ther Comp Criminol 46:453–465

    Google Scholar 

  • Chandrasekaran M, Karayanan K, Upadhyaya S (2006) Towards phishing e-mail detection based on their structural properties, In: Proceedings of the New York State Cyber Security Conference

  • Chau D (2005) Prototyping a lightweight trust architecture to fight phishing, MIT Computer Science and Artificial Intelligence Laboratory, Tech. Rep., final Report. http://groups.csail.mit.edu/cis/crypto/projects/antiphishing/

  • Clark R (1993) Profiling: a hidden challenge to the regulation of data surveillance. J Inf Sci 4(2)

  • Cortez P, Correia A, Sousa P, Rocha M, Rio M, Perner P (2010) Spam Email Filtering Using Network-Level Properties. In: Proceedings, Advances in Data Mining Applications and Theoretical Aspects. 10th Industrial Conference, ICDM 2010. Berlin, Germany

  • Customer profiling survey solution enabling cross and up selling (2007) Confirmit. http://www.confirmit.com/home.aspx

  • Doyle S (2008) Social network analysis in the telco sector marketing applications. J Database Mark Cust Strategy Manag 15:130–134

    Article  Google Scholar 

  • Emigh A (2005) Online identity theft: Phishing technilogy, chokepoints and countermeasures, Radix Labs, Tech. Rep., retrieved from Anti-Phishing Working Group. http://www.antiphishing.org/resources.html

  • FBI method of profiling (2006) Wikipedia, January 2006, retrieved June 10, 2011. http://en.wikipedia.org/wiki/FBI_method_of_profiling

  • Fette I, Sadeh N, Tomasic A (2007) Learning to detect phishing emails. In: WWW ’07: Proceedings of the 16th international conference on the World Wide Web. New York, NY, USA. ACM Press, New York, pp 649–656

  • Freund Y, Schapire R (1999) A short introduction to boosting, J Jpn Soc Artif Intell 14(5):771–780

    Google Scholar 

  • Hanneman RA, Shelton CR (2011) Applying modality and equivalence concepts to pattern finding in social process-produced data. Soc Netw Anal Min 1(1):59–72

    Google Scholar 

  • Interactive investor profile tool (2011) Southside Bank: Trust & Investment Services Group, retrieved June 10, 2011. http://www.southsidetrust.com/tool.html

  • InterNIC : Whois search InterNIC—Public information Regarding Internel Domain Name Registration Services. http://www.internic.net/whois.html

  • Investor profiles (2011) The National Mutual Life Association of Australasia Limited, retrieved June 10, 2011. http://www.axafreedom.com.au/freedom/freedom.nsf/content/InvestorProfiles

  • Jakobsson M, Young A (2005) Distributed phishing attacks, Cryptology ePrint Archive, Report 2005/091. http://eprint.iacr.org/

  • Joachims TJ (2002) Learning to classify text using support vector machines: methods, theory and algorithms. Kluwer Academic Publishers, Dordrecht

  • Juels A, Jakobsson M, Jagatic TN (2006) Cache cookies for browser authentication (extended abstract), In: SP ’06: Proceedings of the 2006 IEEE Symposium on Security and Privacy (S&P’06). minus 0.4em Washington, DC, USA. IEEE Computer Society, NW, pp 301–305

  • Mammadov M, Rubinov A, Yearwood J (2007a) The study of drug-reaction relationships using global optimization techniques. Optim Methods Softw 22(1):99–126

    Google Scholar 

  • Mammadov M, Yearwood J, Banarjee A (2007b) Classification on shorter featured and multi-label datasets. In: Proceedings of the 7th International Conference on Optimization: Techniques and Applications (ICOTA07), December 12–15, Kobe, Japan

  • Market basket analysis (2011) Information drivers, retrieved June 10, 2011. http://www.information-drivers.com/market_basket_analysis.php

  • Petrovic D (2007) Analysis of consumer behaviour online, Analogik.com, Tech. Rep., retrieved June 10, 2011. http://analogik.com/articles/227/analysis-of-consumer-behaviour-online

  • Ripe database RIPE Network Coordination Centre. http://www.ripe.net/index.html

  • Rosen D, Barnett GA, Kim JH (2011) Social networks and online environments: when science and practice co-evolve. Soc Netw Anal Min 1(1):27–42

    Google Scholar 

  • Schapire R, Singer Y (2000) BoosTexter: a boosting-based system for text categorization. Mach Learn 39(2/3):135–168

    Article  MATH  Google Scholar 

  • Sebastiani F (2002) Machine learning in automated text categorization. IACM Comput Surv (CSUR) 34(1). doi:10.1145/505282.505283

  • Stewart J (2003) DNS cache poisoning—the next generation, Secure Works, Tech. Rep. http://www.secureworks.com/research/articles

  • Tang L, Rajan S, Narayanan VK (2009) Large scale multi-label classification via metalabeler. In: Proceedings of the 18th international conference on World Wide Web, ACM Press, New York, pp 211–220

  • The APNIC whois Asia Pacific Network Information Center. http://wq.apnic.net/apnic-bin/whois.pl

  • Tsoumakas G, Katakis I (2007) Multi-label classification: an overview. Int J Data Warehous Min 3(3):1–13

    Article  Google Scholar 

  • Webb D (2011) A free and comprehensive guide to the world of forensic psychology, All About Forensic Psychology, retrieved June, 2011. http://student-guide-to-forensic-psychology.blogspot.com/

  • Webb D, Yearwood J, Vamplew P, Liping M, Ofoghi B, Kelarev A (2009) Applying clustering and ensemble clustering approaches to phishing profiling. In: Proceedings of AusDM 2009, The Australasian Data Mining Conference 2009. CRPIT

  • Wu X, et al (2008) Top 10 algorithms in data mining. Knowl Inf Sys 14:1–37

    Article  Google Scholar 

  • Wu M, Miller R, Little G (2006) Preventing phishing attacks by revealing user intentions. In: Symposium on Usable Privacy and Security (SOUPS)

  • Xu KS, Kliger MM, Yilun C, Woolf PJ, Hero A (2009) Revealing social networks of spammers through spectral clustering. In: Proceedings of the IEEE International Conference on Communications. Dresden, Germany. doi:10.1109/ICC.2009.5199418

  • Yang Y (1999) A re-examination of text categorization methods. In: Proceedings of SIGIR-99, 22nd ACM International Conference on Research and Development in Information Retrieval

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Musa Mammadov.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yearwood, J., Mammadov, M. & Webb, D. Profiling phishing activity based on hyperlinks extracted from phishing emails. Soc. Netw. Anal. Min. 2, 5–16 (2012). https://doi.org/10.1007/s13278-011-0031-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13278-011-0031-y

Keywords

Navigation