Hidden Markov Models for Automated Protocol Learning

Whalen, Sean; Bishop, Matt; Crutchfield, James P.

doi:10.1007/978-3-642-16161-2_24

Sean Whalen^17,18,
Matt Bishop¹⁷ &
James P. Crutchfield^17,18,19

Part of the book series: Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering ((LNICST,volume 50))

Included in the following conference series:

International Conference on Security and Privacy in Communication Systems

1887 Accesses
7 Citations

Abstract

Hidden Markov Models (HMMs) have applications in several areas of computer security. One drawback of HMMs is the selection of appropriate model parameters, which is often ad hoc or requires domain-specific knowledge. While algorithms exist to find local optima for some parameters, the number of states must always be specified and directly impacts the accuracy and generality of the model. In addition, domain knowledge is not always available or may be based on assumptions that prove incorrect or sub-optimal.

We apply the ε-machine—a special type of HMM—to the task of constructing network protocol models solely from network traffic. Unlike previous approaches, ε-machine reconstruction infers the minimal HMM architecture directly from data and is well suited to applications such as anomaly detection. We draw distinctions between our approach and previous research, and discuss the benefits and challenges of ε-machine for protocol model inference.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Erman, J., Mahanti, A., Arlitt, M.: Internet traffic identification using machine learning. In: Proceedings of the 49th IEEE Global Telecommunications Conference, pp. 1–6 (2006)
Google Scholar
Rabiner, L.: A tutorial on Hidden Markov Models and selected applications in speech recognition. Proceedings of the IEEE 77, 257–286 (1989)
Article Google Scholar
Crutchfield, J.P., Young, K.: Inferring statistical complexity. Phys. Rev. Let. 63 (1989); Crutchfield, J.P.: Physica D 75 11–54 (1994); Crutchfield, J. P., Shalizi, C. R.: Phys. Rev. E 59(1), 275–283, 105–108 (1999)
Google Scholar
Cover, T.M., Thomas, J.A.: Elements of Information Theory, 2nd edn. Wiley Interscience, New York (2006)
MATH Google Scholar
Beddoe, M.: Network protocol analysis using bioinformatics algorithms. Technical report, McAfee Inc. (2005)
Google Scholar
Cui, W., Paxson, V., Weaver, N., Katz, R.: Protocol-independent adaptive replay of application dialog. In: Proceedings of the 13th Annual Symposium on Network and Distributed System Security (2006)
Google Scholar
Cui, W., Kannan, J., Wang, H.: Discoverer: Automatic protocol reverse engineering from network traces. In: Proceedings of 16th USENIX Security Symposium on USENIX Security Symposium, pp. 1–14 (2007)
Google Scholar
Lin, Z., Jiang, X., Xu, D., Zhang, X.: Automatic protocol format reverse engineering through context-aware monitored execution. In: Proceedings of the 15th Annual Network and Distributed System Security Symposium (2008)
Google Scholar
Wondracek, G., Milani Comparetti, P., Kruegel, C., Kirda, E.: Automatic network protocol analysis. In: Proceedings of the 15th Symposium on Network and Distributed System Security (2008)
Google Scholar
Caballero, J., Poosankam, P., Kreibich, C., Song, D.: Dispatcher: enabling active botnet infiltration using automatic protocol reverse-engineering. In: Proceedings of the 16th ACM conference on Computer and Communications Security, pp. 621–634 (2009)
Google Scholar
Leita, C., Mermoud, K., Dacier, M.: Scriptgen: An automated script generation tool for honeyd. In: Proceedings of the 21st Annual Computer Security Applications Conference, pp. 203–214 (2005)
Google Scholar
Milani Comparetti, P., Wondracek, G., Kruegel, C., Kirda, E.: Prospex: Protocol specification extraction. In: IEEE Symposium on Security and Privacy (2009)
Google Scholar
Norris, J.R.: Markov Chains. Cambridge University Press, Cambridge (1997)
Book MATH Google Scholar
Crutchfield, J., Feldman, D.: Regularities unseen, randomness observed: Levels of entropy convergence. Chaos 15, 25–54 (2003)
Article MathSciNet MATH Google Scholar
Shalizi, C.R., Shalizi, K.L.: Blind construction of optimal nonlinear recursive predictors for discrete sequences. In: Proceedings of the 20th conference on Uncertainty in Artificial Intelligence, pp. 504–511 (2004)
Google Scholar
Shalizi, C., Shalizi, K., Crutchfield, J.: Pattern discovery in time series, Part I: Theory, algorithm, analysis, and convergence, 2002 Santa Fe Institute Working Paper 02-10-060; arXiv.org/abs/cs.LG/0210025
Google Scholar
Li, H., Zhang, K., Jiang, T.: Minimum entropy clustering and applications to gene expression analysis. In: Computational Systems Bioinformatics Conference, International IEEE Computer Society, pp. 142–151 (2004)
Google Scholar
Postel, J.: Internet Control Message Protocol (1981), Updated by RFCs 950, 4884
Google Scholar
Modbus Organization: Modbus Messaging Implementation Guide 1.0b (2006)
Google Scholar
Bugalho, M., Oliveira, A.L.: Inference of regular languages using state merging algorithms with search. Pattern Recognition 38 (2005)
Google Scholar
Godefroid, P.: Random testing for security: blackbox vs. whitebox fuzzing. In: Proceedings of the 2nd international workshop on Random testing, p. 1 (2007)
Google Scholar
Infigo Information Security: Multiple FTP Servers vulnerabilities (2006) (accessed October 29, 2006)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of California, Davis, USA
Sean Whalen, Matt Bishop & James P. Crutchfield
Department of Physics, University of California, Davis, USA
Sean Whalen & James P. Crutchfield
Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, New Mexico, 87501, USA
James P. Crutchfield

Authors

Sean Whalen
View author publications
You can also search for this author in PubMed Google Scholar
Matt Bishop
View author publications
You can also search for this author in PubMed Google Scholar
James P. Crutchfield
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Center for Secure Information Systems, George Mason University, 4400 University Drive, 22030-4422, Fairfax, VA, USA
Sushil Jajodia
Institute for Infocomm Research, 1 Fusionopolis Way, 21-01 Connexis, 138632, South Tower, Singapore
Jianying Zhou

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Whalen, S., Bishop, M., Crutchfield, J.P. (2010). Hidden Markov Models for Automated Protocol Learning. In: Jajodia, S., Zhou, J. (eds) Security and Privacy in Communication Networks. SecureComm 2010. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 50. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-16161-2_24

Download citation

DOI: https://doi.org/10.1007/978-3-642-16161-2_24
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-16160-5
Online ISBN: 978-3-642-16161-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics