Profiling phishing activity based on hyperlinks extracted from phishing emails

Yearwood, John; Mammadov, Musa; Webb, Dean

doi:10.1007/s13278-011-0031-y

Profiling phishing activity based on hyperlinks extracted from phishing emails

Original Article
Published: 15 June 2011

Volume 2, pages 5–16, (2012)
Cite this article

Social Network Analysis and Mining Aims and scope Submit manuscript

John Yearwood¹,
Musa Mammadov¹ &
Dean Webb¹

600 Accesses
12 Citations
Explore all metrics

Abstract

Phishing activity has recently been focused on social networking sites as a more effective way of exploiting not only the technology but also the trust that may exist between members in a social network. In this paper, a novel method for profiling phishing activity from an analysis of phishing emails is proposed. Profiling is useful in determining the activity of an individual or a particular group of phishers. Work in the area of phishing is usually aimed at detection of phishing emails. In this paper, we concentrate on profiling as distinct from detection of phishing emails. We formulate the profiling problem as a multi-label classification problem using the hyperlinks in the phishing emails as features and structural properties of emails along with whois (i.e. DNS) information on hyperlinks as profile classes. Further, we generate profiles based on the classifier predictions. Thus, classes become elements of profiles. We employ a boosting algorithm (AdaBoost) as well as SVM to generate multi-label class predictions on three different datasets created from hyperlink information in phishing emails. These predictions are further utilized to generate complete profiles of these emails. Results show that profiling can be done with quite high accuracy using hyperlink information.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Machine Learning-Based Phishing Detection in Heterogeneous Information Network

A machine learning based approach for phishing detection using hyperlinks information

Article 26 April 2018

Modeling Hybrid Feature-Based Phishing Websites Detection Using Machine Learning Techniques

Article 21 March 2022

References

Alison L, Smith M, Eastman O, Rainbow L (2003) Toulmin’s philosophy of argument and its relevance to offender profiling. Psychol Crime Law 9(2):173–183
Article Google Scholar
Bhattacharyya P, Garg A, Wu SF (2010) Analysis of user keyword similarity in online social networks. Social Netw Anal Min. doi:10.1007/s13278-010-0006-4
Brandjacking index (2009) Markmonitor.com, Spring 2009. http://www.markmonitor.com/download/bji/BrandjackingIndex-Spring2009.pdf
Castle T, Hensley C (2002) Serial killers with military experience: applying learning theory to serial murder. International J Offender Ther Comp Criminol 46:453–465
Google Scholar
Chandrasekaran M, Karayanan K, Upadhyaya S (2006) Towards phishing e-mail detection based on their structural properties, In: Proceedings of the New York State Cyber Security Conference
Chau D (2005) Prototyping a lightweight trust architecture to fight phishing, MIT Computer Science and Artificial Intelligence Laboratory, Tech. Rep., final Report. http://groups.csail.mit.edu/cis/crypto/projects/antiphishing/
Clark R (1993) Profiling: a hidden challenge to the regulation of data surveillance. J Inf Sci 4(2)
Cortez P, Correia A, Sousa P, Rocha M, Rio M, Perner P (2010) Spam Email Filtering Using Network-Level Properties. In: Proceedings, Advances in Data Mining Applications and Theoretical Aspects. 10th Industrial Conference, ICDM 2010. Berlin, Germany
Customer profiling survey solution enabling cross and up selling (2007) Confirmit. http://www.confirmit.com/home.aspx
Doyle S (2008) Social network analysis in the telco sector marketing applications. J Database Mark Cust Strategy Manag 15:130–134
Article Google Scholar
Emigh A (2005) Online identity theft: Phishing technilogy, chokepoints and countermeasures, Radix Labs, Tech. Rep., retrieved from Anti-Phishing Working Group. http://www.antiphishing.org/resources.html
FBI method of profiling (2006) Wikipedia, January 2006, retrieved June 10, 2011. http://en.wikipedia.org/wiki/FBI_method_of_profiling
Fette I, Sadeh N, Tomasic A (2007) Learning to detect phishing emails. In: WWW ’07: Proceedings of the 16th international conference on the World Wide Web. New York, NY, USA. ACM Press, New York, pp 649–656
Freund Y, Schapire R (1999) A short introduction to boosting, J Jpn Soc Artif Intell 14(5):771–780
Google Scholar
Hanneman RA, Shelton CR (2011) Applying modality and equivalence concepts to pattern finding in social process-produced data. Soc Netw Anal Min 1(1):59–72
Google Scholar
Interactive investor profile tool (2011) Southside Bank: Trust & Investment Services Group, retrieved June 10, 2011. http://www.southsidetrust.com/tool.html
InterNIC : Whois search InterNIC—Public information Regarding Internel Domain Name Registration Services. http://www.internic.net/whois.html
Investor profiles (2011) The National Mutual Life Association of Australasia Limited, retrieved June 10, 2011. http://www.axafreedom.com.au/freedom/freedom.nsf/content/InvestorProfiles
Jakobsson M, Young A (2005) Distributed phishing attacks, Cryptology ePrint Archive, Report 2005/091. http://eprint.iacr.org/
Joachims TJ (2002) Learning to classify text using support vector machines: methods, theory and algorithms. Kluwer Academic Publishers, Dordrecht
Juels A, Jakobsson M, Jagatic TN (2006) Cache cookies for browser authentication (extended abstract), In: SP ’06: Proceedings of the 2006 IEEE Symposium on Security and Privacy (S&P’06). minus 0.4em Washington, DC, USA. IEEE Computer Society, NW, pp 301–305
Mammadov M, Rubinov A, Yearwood J (2007a) The study of drug-reaction relationships using global optimization techniques. Optim Methods Softw 22(1):99–126
Google Scholar
Mammadov M, Yearwood J, Banarjee A (2007b) Classification on shorter featured and multi-label datasets. In: Proceedings of the 7th International Conference on Optimization: Techniques and Applications (ICOTA07), December 12–15, Kobe, Japan
Market basket analysis (2011) Information drivers, retrieved June 10, 2011. http://www.information-drivers.com/market_basket_analysis.php
Petrovic D (2007) Analysis of consumer behaviour online, Analogik.com, Tech. Rep., retrieved June 10, 2011. http://analogik.com/articles/227/analysis-of-consumer-behaviour-online
Ripe database RIPE Network Coordination Centre. http://www.ripe.net/index.html
Rosen D, Barnett GA, Kim JH (2011) Social networks and online environments: when science and practice co-evolve. Soc Netw Anal Min 1(1):27–42
Google Scholar
Schapire R, Singer Y (2000) BoosTexter: a boosting-based system for text categorization. Mach Learn 39(2/3):135–168
Article MATH Google Scholar
Sebastiani F (2002) Machine learning in automated text categorization. IACM Comput Surv (CSUR) 34(1). doi:10.1145/505282.505283
Stewart J (2003) DNS cache poisoning—the next generation, Secure Works, Tech. Rep. http://www.secureworks.com/research/articles
Tang L, Rajan S, Narayanan VK (2009) Large scale multi-label classification via metalabeler. In: Proceedings of the 18th international conference on World Wide Web, ACM Press, New York, pp 211–220
The APNIC whois Asia Pacific Network Information Center. http://wq.apnic.net/apnic-bin/whois.pl
Tsoumakas G, Katakis I (2007) Multi-label classification: an overview. Int J Data Warehous Min 3(3):1–13
Article Google Scholar
Webb D (2011) A free and comprehensive guide to the world of forensic psychology, All About Forensic Psychology, retrieved June, 2011. http://student-guide-to-forensic-psychology.blogspot.com/
Webb D, Yearwood J, Vamplew P, Liping M, Ofoghi B, Kelarev A (2009) Applying clustering and ensemble clustering approaches to phishing profiling. In: Proceedings of AusDM 2009, The Australasian Data Mining Conference 2009. CRPIT
Wu X, et al (2008) Top 10 algorithms in data mining. Knowl Inf Sys 14:1–37
Article Google Scholar
Wu M, Miller R, Little G (2006) Preventing phishing attacks by revealing user intentions. In: Symposium on Usable Privacy and Security (SOUPS)
Xu KS, Kliger MM, Yilun C, Woolf PJ, Hero A (2009) Revealing social networks of spammers through spectral clustering. In: Proceedings of the IEEE International Conference on Communications. Dresden, Germany. doi:10.1109/ICC.2009.5199418
Yang Y (1999) A re-examination of text categorization methods. In: Proceedings of SIGIR-99, 22nd ACM International Conference on Research and Development in Information Retrieval

Download references

Author information

Authors and Affiliations

Graduate School of ITMS, University of Ballarat, Ballarat, VIC, Australia
John Yearwood, Musa Mammadov & Dean Webb

Authors

John Yearwood
View author publications
You can also search for this author in PubMed Google Scholar
Musa Mammadov
View author publications
You can also search for this author in PubMed Google Scholar
Dean Webb
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Musa Mammadov.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yearwood, J., Mammadov, M. & Webb, D. Profiling phishing activity based on hyperlinks extracted from phishing emails. Soc. Netw. Anal. Min. 2, 5–16 (2012). https://doi.org/10.1007/s13278-011-0031-y

Download citation

Received: 28 October 2010
Revised: 19 February 2011
Accepted: 28 May 2011
Published: 15 June 2011
Issue Date: March 2012
DOI: https://doi.org/10.1007/s13278-011-0031-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Profiling phishing activity based on hyperlinks extracted from phishing emails

Abstract

Access this article

Similar content being viewed by others

Machine Learning-Based Phishing Detection in Heterogeneous Information Network

A machine learning based approach for phishing detection using hyperlinks information

Modeling Hybrid Feature-Based Phishing Websites Detection Using Machine Learning Techniques

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Profiling phishing activity based on hyperlinks extracted from phishing emails

Abstract

Access this article

Similar content being viewed by others

Machine Learning-Based Phishing Detection in Heterogeneous Information Network

A machine learning based approach for phishing detection using hyperlinks information

Modeling Hybrid Feature-Based Phishing Websites Detection Using Machine Learning Techniques

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation