Skip to main content

An On-Line Learning Statistical Model to Detect Malicious Web Requests

  • Conference paper
Security and Privacy in Communication Networks (SecureComm 2011)

Abstract

Detecting malicious connection attempts and attacks against web-based applications is one of many approaches to protect the World Wide Web and its users.

In this paper, we present a generic method for detecting anomalous and potentially malicious web requests from the network’s point of view without prior knowledge or training data of the web-based application. The algorithm assumes that a legitimate request is an ordered sequence of semantic entities. Malicious requests are in different order or include entities which deviate from the structure of the majority of requests. Our method learns a variable-order Markov model from legitimate sequences of semantic entities. If a sequence’s probability deviates from previously seen ones, it is reported as anomalous.

Experiments were conducted on logs from a social networking web site. The results indicate that that the proposed method achieves good detection rates at acceptable false-alarm rates.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aho, A.V., Corasick, M.J.: Efficient string matching: an aid to bibliographic search. Commun. ACM 18(6), 333–340 (1975)

    Article  MathSciNet  MATH  Google Scholar 

  2. Apache 2.0 Documentation: Apache Module mod_rewrite (2011), http://httpd.apache.org/docs/2.0/mod/mod_rewrite.html (Online; accessed April 28, 2011)

  3. Axelsson, S.: The base-rate fallacy and its implications for the difficulty of intrusion detection. In: CCS 1999: Proceedings of the 6th ACM Conference on Computer and Communications Security, pp. 1–7. ACM, New York (1999)

    Google Scholar 

  4. Begleiter, R., El-Yaniv, R., Yona, G.: On prediction using variable order markov models. J. Artif. Int. Res. 22(1), 385–421 (2004)

    MathSciNet  MATH  Google Scholar 

  5. Berners-Lee, T., Fielding, R., Masinter, L.: Uniform Resource Identifier (URI): Generic Syntax. RFC 3986 (Standard) (January 2005), http://www.ietf.org/rfc/rfc3986.txt

  6. Chan-Tin, E., Feldman, D., Hopper, N., Kim, Y.: The Frog-Boiling Attack: Limitations of Anomaly Detection for Secure Network Coordinate Systems. In: Chen, Y., Dimitriou, T.D., Zhou, J. (eds.) SecureComm 2009. LNICST, vol. 19, pp. 448–458. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  7. Cleary, J.G., Witten, I.H.: Data compression using adaptive coding and partial string matching. IEEE Transactions on Communications 32, 396–402 (1984)

    Article  Google Scholar 

  8. Davis, J., Goadrich, M.: The relationship between precision-recall and roc curves. In: ICML 2006, pp. 233–240. ACM, New York (2006)

    Google Scholar 

  9. Düssel, P., Gehl, C., Laskov, P., Rieck, K.: Incorporation of Application Layer Protocol Syntax into Anomaly Detection. In: Sekar, R., Pujari, A.K. (eds.) ICISS 2008. LNCS, vol. 5352, pp. 188–202. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  10. Evans, M., Hastings, N., Peacock, B.: Statistical Distributions, 3rd edn. Wiley-Interscience (2000)

    Google Scholar 

  11. Fielding, R., Gettys, J., Mogul, J., Frystyk, H., Masinter, L., Leach, P., Berners-Lee, T.: Hypertext Transfer Protocol – HTTP/1.1. RFC 2616 (Draft Standard) (June 1999), http://www.ietf.org/rfc/rfc2616.txt , updated by RFCs 2817, 5785

  12. Görnitz, N., Kloft, M., Rieck, K., Brefeld, U.: Active learning for network intrusion detection. In: Proceedings of the 2nd ACM Workshop on Security and Artificial Intelligence, AISec 2009, pp. 47–54. ACM, New York (2009)

    Google Scholar 

  13. Han, J., Kamber, M.: Data Mining: Concepts and Techniques, 2nd edn. Morgan Kaufmann Publishers Inc., San Francisco (2006)

    MATH  Google Scholar 

  14. Ingham, K.L., Somayaji, A., Burge, J., Forrest, S.: Learning dfa representations of http for protecting web applications. Comput. Netw. 51, 1239–1255 (2007)

    Article  MATH  Google Scholar 

  15. Knuth, D.E.: The Art of Computer Programming. Seminumerical Algorithms, 2nd edn., vol. II. Addison-Wesley (1981)

    Google Scholar 

  16. Kruegel, C., Vigna, G.: Anomaly detection of web-based attacks. In: CCS 2003: Proceedings of the 10th ACM Conference on Computer and Communications Security, pp. 251–261. ACM, New York (2003)

    Google Scholar 

  17. Krueger, T., Gehl, C., Rieck, K., Laskov, P.: Tokdoc: a self-healing web application firewall. In: SAC 2010: Proceedings of the 2010 ACM Symposium on Applied Computing, pp. 1846–1853. ACM, New York (2010)

    Google Scholar 

  18. Ma, J., Liu, X., Wang, Q., Dai, G.: Compression-based web anomaly detection model. In: 2010 IEEE 29th International Performance Computing and Communications Conference (IPCCC) (December 2010)

    Google Scholar 

  19. Maggi, F., Robertson, W., Kruegel, C., Vigna, G.: Protecting a Moving Target: Addressing Web Application Concept Drift. In: Kirda, E., Jha, S., Balzarotti, D. (eds.) RAID 2009. LNCS, vol. 5758, pp. 21–40. Springer, Heidelberg (2009)

    Google Scholar 

  20. Metasploit: The Metasploit Project (2011), http://www.metasploit.com/ (Online; accessed April 30, 2011)

  21. MITRE Corporation: Common Vulnerabilites and Exposures (2011), http://cve.mitre.org/ (Online; accessed May 12, 2011)

  22. MITRE Corporation: Common Weakness Enumeration (2011), http://cwe.mitre.org/ (Online; accessed April 28, 2011)

  23. Moffat, A.: Implementing the ppm data compression scheme. IEEE Transactions on Communications 38(11), 1917–1921 (1990)

    Article  Google Scholar 

  24. Perdisci, R., Ariu, D., Fogla, P., Giacinto, G., Lee, W.: Mcpad: A multiple classifier system for accurate payload-based anomaly detection. Computer Networks 53(6), 864–881 (2009); traffic Classification and Its Applications to Modern Networks

    Article  MATH  Google Scholar 

  25. Provos, N., McNamee, D., Mavrommatis, P., Wang, K., Modadugu, N.: The ghost in the browser analysis of web-based malware. In: Proceedings of the First Conference on First Workshop on Hot Topics in Understanding Botnets. USENIX Association, Berkeley (2007)

    Google Scholar 

  26. Robertson, W., Vigna, G., Kruegel, C., Kemmerer, R.: Using generalization and characterization techniques in the anomaly-based detection of web attacks. In: Proceedings of the Network and Distributed System Security Symposium (NDSS), San Diego, CA (February 2006)

    Google Scholar 

  27. Robertson, W., Maggi, F., Kruegel, C., Vigna, G.: Effective anomaly detection with scarce training data. In: Proceedings of the Network and Distributed System Security Symposium (NDSS), San Diego, CA (February 2010)

    Google Scholar 

  28. Salomon, D.: Data Compression: The Complete Reference. Springer, Heidelberg (2007)

    MATH  Google Scholar 

  29. Sommer, R., Paxson, V.: Outside the closed world: On using machine learning for network intrusion detection. In: IEEE Symposium on Security and Privacy, pp. 305–316 (2010)

    Google Scholar 

  30. Song, Y., Keromytis, A.D., Stolfo, S.J.: Spectrogram: A mixture-of-markov-chains model for anomaly detection in web traffic. In: Proc. of Network and Distributed System Security Symposium, NDSS (2009)

    Google Scholar 

  31. Vovk, V., Gammerman, A., Shafer, G.: Algorithmic Learning in a Random World. Springer-Verlag New York, Inc., Secaucus (2005)

    MATH  Google Scholar 

  32. Wagner, D., Soto, P.: Mimicry attacks on host-based intrusion detection systems. In: Proceedings of the 9th ACM Conference on Computer and Communications Security, CCS 2002, pp. 255–264. ACM, New York (2002)

    Google Scholar 

  33. Wang, K., Parekh, J.J., Stolfo, S.J.: Anagram: A Content Anomaly Detector Resistant to Mimicry Attack. In: Zamboni, D., Kruegel, C. (eds.) RAID 2006. LNCS, vol. 4219, pp. 226–248. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  34. Wang, K., Stolfo, S.J.: Anomalous Payload-Based Network Intrusion Detection. In: Jonsson, E., Valdes, A., Almgren, M. (eds.) RAID 2004. LNCS, vol. 3224, pp. 203–222. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  35. Welford, B.P.: Note on a method for calculating corrected sums of squares and products. Technometrics 4(3), 419–420 (1962)

    Article  MathSciNet  Google Scholar 

  36. Wojtczuk, R.: Libnids (2011), http://libnids.sourceforge.net/ (Online; accessed May 9, 2011)

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 ICST Institute for Computer Science, Social Informatics and Telecommunications Engineering

About this paper

Cite this paper

Lampesberger, H., Winter, P., Zeilinger, M., Hermann, E. (2012). An On-Line Learning Statistical Model to Detect Malicious Web Requests. In: Rajarajan, M., Piper, F., Wang, H., Kesidis, G. (eds) Security and Privacy in Communication Networks. SecureComm 2011. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 96. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31909-9_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-31909-9_2

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-31908-2

  • Online ISBN: 978-3-642-31909-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics