Abstract
Phishing activity has recently been focused on social networking sites as a more effective way of exploiting not only the technology but also the trust that may exist between members in a social network. In this paper, a novel method for profiling phishing activity from an analysis of phishing emails is proposed. Profiling is useful in determining the activity of an individual or a particular group of phishers. Work in the area of phishing is usually aimed at detection of phishing emails. In this paper, we concentrate on profiling as distinct from detection of phishing emails. We formulate the profiling problem as a multi-label classification problem using the hyperlinks in the phishing emails as features and structural properties of emails along with whois (i.e. DNS) information on hyperlinks as profile classes. Further, we generate profiles based on the classifier predictions. Thus, classes become elements of profiles. We employ a boosting algorithm (AdaBoost) as well as SVM to generate multi-label class predictions on three different datasets created from hyperlink information in phishing emails. These predictions are further utilized to generate complete profiles of these emails. Results show that profiling can be done with quite high accuracy using hyperlink information.
Similar content being viewed by others
References
Alison L, Smith M, Eastman O, Rainbow L (2003) Toulmin’s philosophy of argument and its relevance to offender profiling. Psychol Crime Law 9(2):173–183
Bhattacharyya P, Garg A, Wu SF (2010) Analysis of user keyword similarity in online social networks. Social Netw Anal Min. doi:10.1007/s13278-010-0006-4
Brandjacking index (2009) Markmonitor.com, Spring 2009. http://www.markmonitor.com/download/bji/BrandjackingIndex-Spring2009.pdf
Castle T, Hensley C (2002) Serial killers with military experience: applying learning theory to serial murder. International J Offender Ther Comp Criminol 46:453–465
Chandrasekaran M, Karayanan K, Upadhyaya S (2006) Towards phishing e-mail detection based on their structural properties, In: Proceedings of the New York State Cyber Security Conference
Chau D (2005) Prototyping a lightweight trust architecture to fight phishing, MIT Computer Science and Artificial Intelligence Laboratory, Tech. Rep., final Report. http://groups.csail.mit.edu/cis/crypto/projects/antiphishing/
Clark R (1993) Profiling: a hidden challenge to the regulation of data surveillance. J Inf Sci 4(2)
Cortez P, Correia A, Sousa P, Rocha M, Rio M, Perner P (2010) Spam Email Filtering Using Network-Level Properties. In: Proceedings, Advances in Data Mining Applications and Theoretical Aspects. 10th Industrial Conference, ICDM 2010. Berlin, Germany
Customer profiling survey solution enabling cross and up selling (2007) Confirmit. http://www.confirmit.com/home.aspx
Doyle S (2008) Social network analysis in the telco sector marketing applications. J Database Mark Cust Strategy Manag 15:130–134
Emigh A (2005) Online identity theft: Phishing technilogy, chokepoints and countermeasures, Radix Labs, Tech. Rep., retrieved from Anti-Phishing Working Group. http://www.antiphishing.org/resources.html
FBI method of profiling (2006) Wikipedia, January 2006, retrieved June 10, 2011. http://en.wikipedia.org/wiki/FBI_method_of_profiling
Fette I, Sadeh N, Tomasic A (2007) Learning to detect phishing emails. In: WWW ’07: Proceedings of the 16th international conference on the World Wide Web. New York, NY, USA. ACM Press, New York, pp 649–656
Freund Y, Schapire R (1999) A short introduction to boosting, J Jpn Soc Artif Intell 14(5):771–780
Hanneman RA, Shelton CR (2011) Applying modality and equivalence concepts to pattern finding in social process-produced data. Soc Netw Anal Min 1(1):59–72
Interactive investor profile tool (2011) Southside Bank: Trust & Investment Services Group, retrieved June 10, 2011. http://www.southsidetrust.com/tool.html
InterNIC : Whois search InterNIC—Public information Regarding Internel Domain Name Registration Services. http://www.internic.net/whois.html
Investor profiles (2011) The National Mutual Life Association of Australasia Limited, retrieved June 10, 2011. http://www.axafreedom.com.au/freedom/freedom.nsf/content/InvestorProfiles
Jakobsson M, Young A (2005) Distributed phishing attacks, Cryptology ePrint Archive, Report 2005/091. http://eprint.iacr.org/
Joachims TJ (2002) Learning to classify text using support vector machines: methods, theory and algorithms. Kluwer Academic Publishers, Dordrecht
Juels A, Jakobsson M, Jagatic TN (2006) Cache cookies for browser authentication (extended abstract), In: SP ’06: Proceedings of the 2006 IEEE Symposium on Security and Privacy (S&P’06). minus 0.4em Washington, DC, USA. IEEE Computer Society, NW, pp 301–305
Mammadov M, Rubinov A, Yearwood J (2007a) The study of drug-reaction relationships using global optimization techniques. Optim Methods Softw 22(1):99–126
Mammadov M, Yearwood J, Banarjee A (2007b) Classification on shorter featured and multi-label datasets. In: Proceedings of the 7th International Conference on Optimization: Techniques and Applications (ICOTA07), December 12–15, Kobe, Japan
Market basket analysis (2011) Information drivers, retrieved June 10, 2011. http://www.information-drivers.com/market_basket_analysis.php
Petrovic D (2007) Analysis of consumer behaviour online, Analogik.com, Tech. Rep., retrieved June 10, 2011. http://analogik.com/articles/227/analysis-of-consumer-behaviour-online
Ripe database RIPE Network Coordination Centre. http://www.ripe.net/index.html
Rosen D, Barnett GA, Kim JH (2011) Social networks and online environments: when science and practice co-evolve. Soc Netw Anal Min 1(1):27–42
Schapire R, Singer Y (2000) BoosTexter: a boosting-based system for text categorization. Mach Learn 39(2/3):135–168
Sebastiani F (2002) Machine learning in automated text categorization. IACM Comput Surv (CSUR) 34(1). doi:10.1145/505282.505283
Stewart J (2003) DNS cache poisoning—the next generation, Secure Works, Tech. Rep. http://www.secureworks.com/research/articles
Tang L, Rajan S, Narayanan VK (2009) Large scale multi-label classification via metalabeler. In: Proceedings of the 18th international conference on World Wide Web, ACM Press, New York, pp 211–220
The APNIC whois Asia Pacific Network Information Center. http://wq.apnic.net/apnic-bin/whois.pl
Tsoumakas G, Katakis I (2007) Multi-label classification: an overview. Int J Data Warehous Min 3(3):1–13
Webb D (2011) A free and comprehensive guide to the world of forensic psychology, All About Forensic Psychology, retrieved June, 2011. http://student-guide-to-forensic-psychology.blogspot.com/
Webb D, Yearwood J, Vamplew P, Liping M, Ofoghi B, Kelarev A (2009) Applying clustering and ensemble clustering approaches to phishing profiling. In: Proceedings of AusDM 2009, The Australasian Data Mining Conference 2009. CRPIT
Wu X, et al (2008) Top 10 algorithms in data mining. Knowl Inf Sys 14:1–37
Wu M, Miller R, Little G (2006) Preventing phishing attacks by revealing user intentions. In: Symposium on Usable Privacy and Security (SOUPS)
Xu KS, Kliger MM, Yilun C, Woolf PJ, Hero A (2009) Revealing social networks of spammers through spectral clustering. In: Proceedings of the IEEE International Conference on Communications. Dresden, Germany. doi:10.1109/ICC.2009.5199418
Yang Y (1999) A re-examination of text categorization methods. In: Proceedings of SIGIR-99, 22nd ACM International Conference on Research and Development in Information Retrieval
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Yearwood, J., Mammadov, M. & Webb, D. Profiling phishing activity based on hyperlinks extracted from phishing emails. Soc. Netw. Anal. Min. 2, 5–16 (2012). https://doi.org/10.1007/s13278-011-0031-y
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13278-011-0031-y