Skip to main content

Hidden Markov Models for Automated Protocol Learning

  • Conference paper
Security and Privacy in Communication Networks (SecureComm 2010)

Abstract

Hidden Markov Models (HMMs) have applications in several areas of computer security. One drawback of HMMs is the selection of appropriate model parameters, which is often ad hoc or requires domain-specific knowledge. While algorithms exist to find local optima for some parameters, the number of states must always be specified and directly impacts the accuracy and generality of the model. In addition, domain knowledge is not always available or may be based on assumptions that prove incorrect or sub-optimal.

We apply the ε-machine—a special type of HMM—to the task of constructing network protocol models solely from network traffic. Unlike previous approaches, ε-machine reconstruction infers the minimal HMM architecture directly from data and is well suited to applications such as anomaly detection. We draw distinctions between our approach and previous research, and discuss the benefits and challenges of ε-machine for protocol model inference.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Erman, J., Mahanti, A., Arlitt, M.: Internet traffic identification using machine learning. In: Proceedings of the 49th IEEE Global Telecommunications Conference, pp. 1–6 (2006)

    Google Scholar 

  2. Rabiner, L.: A tutorial on Hidden Markov Models and selected applications in speech recognition. Proceedings of the IEEE 77, 257–286 (1989)

    Article  Google Scholar 

  3. Crutchfield, J.P., Young, K.: Inferring statistical complexity. Phys. Rev. Let. 63 (1989); Crutchfield, J.P.: Physica D 75 11–54 (1994); Crutchfield, J. P., Shalizi, C. R.: Phys. Rev. E 59(1), 275–283, 105–108 (1999)

    Google Scholar 

  4. Cover, T.M., Thomas, J.A.: Elements of Information Theory, 2nd edn. Wiley Interscience, New York (2006)

    MATH  Google Scholar 

  5. Beddoe, M.: Network protocol analysis using bioinformatics algorithms. Technical report, McAfee Inc. (2005)

    Google Scholar 

  6. Cui, W., Paxson, V., Weaver, N., Katz, R.: Protocol-independent adaptive replay of application dialog. In: Proceedings of the 13th Annual Symposium on Network and Distributed System Security (2006)

    Google Scholar 

  7. Cui, W., Kannan, J., Wang, H.: Discoverer: Automatic protocol reverse engineering from network traces. In: Proceedings of 16th USENIX Security Symposium on USENIX Security Symposium, pp. 1–14 (2007)

    Google Scholar 

  8. Lin, Z., Jiang, X., Xu, D., Zhang, X.: Automatic protocol format reverse engineering through context-aware monitored execution. In: Proceedings of the 15th Annual Network and Distributed System Security Symposium (2008)

    Google Scholar 

  9. Wondracek, G., Milani Comparetti, P., Kruegel, C., Kirda, E.: Automatic network protocol analysis. In: Proceedings of the 15th Symposium on Network and Distributed System Security (2008)

    Google Scholar 

  10. Caballero, J., Poosankam, P., Kreibich, C., Song, D.: Dispatcher: enabling active botnet infiltration using automatic protocol reverse-engineering. In: Proceedings of the 16th ACM conference on Computer and Communications Security, pp. 621–634 (2009)

    Google Scholar 

  11. Leita, C., Mermoud, K., Dacier, M.: Scriptgen: An automated script generation tool for honeyd. In: Proceedings of the 21st Annual Computer Security Applications Conference, pp. 203–214 (2005)

    Google Scholar 

  12. Milani Comparetti, P., Wondracek, G., Kruegel, C., Kirda, E.: Prospex: Protocol specification extraction. In: IEEE Symposium on Security and Privacy (2009)

    Google Scholar 

  13. Norris, J.R.: Markov Chains. Cambridge University Press, Cambridge (1997)

    Book  MATH  Google Scholar 

  14. Crutchfield, J., Feldman, D.: Regularities unseen, randomness observed: Levels of entropy convergence. Chaos 15, 25–54 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  15. Shalizi, C.R., Shalizi, K.L.: Blind construction of optimal nonlinear recursive predictors for discrete sequences. In: Proceedings of the 20th conference on Uncertainty in Artificial Intelligence, pp. 504–511 (2004)

    Google Scholar 

  16. Shalizi, C., Shalizi, K., Crutchfield, J.: Pattern discovery in time series, Part I: Theory, algorithm, analysis, and convergence, 2002 Santa Fe Institute Working Paper 02-10-060; arXiv.org/abs/cs.LG/0210025

    Google Scholar 

  17. Li, H., Zhang, K., Jiang, T.: Minimum entropy clustering and applications to gene expression analysis. In: Computational Systems Bioinformatics Conference, International IEEE Computer Society, pp. 142–151 (2004)

    Google Scholar 

  18. Postel, J.: Internet Control Message Protocol (1981), Updated by RFCs 950, 4884

    Google Scholar 

  19. Modbus Organization: Modbus Messaging Implementation Guide 1.0b (2006)

    Google Scholar 

  20. Bugalho, M., Oliveira, A.L.: Inference of regular languages using state merging algorithms with search. Pattern Recognition 38 (2005)

    Google Scholar 

  21. Godefroid, P.: Random testing for security: blackbox vs. whitebox fuzzing. In: Proceedings of the 2nd international workshop on Random testing, p. 1 (2007)

    Google Scholar 

  22. Infigo Information Security: Multiple FTP Servers vulnerabilities (2006) (accessed October 29, 2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 ICST Institute for Computer Science, Social Informatics and Telecommunications Engineering

About this paper

Cite this paper

Whalen, S., Bishop, M., Crutchfield, J.P. (2010). Hidden Markov Models for Automated Protocol Learning. In: Jajodia, S., Zhou, J. (eds) Security and Privacy in Communication Networks. SecureComm 2010. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 50. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-16161-2_24

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-16161-2_24

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-16160-5

  • Online ISBN: 978-3-642-16161-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics