Skip to main content

Shingled Graph Disassembly: Finding the Undecideable Path

  • Conference paper
Advances in Knowledge Discovery and Data Mining (PAKDD 2014)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8443))

Included in the following conference series:

Abstract

A probabilistic finite state machine approach to statically disassembling x86 machine language programs is presented and evaluated. Static disassembly is a crucial prerequisite for software reverse engineering, and has many applications in computer security and binary analysis. The general problem is provably undecidable because of the heavy use of unaligned instruction encodings and dynamically computed control flows in the x86 architecture. Limited work in machine learning and data mining has been undertaken on this subject. This paper shows that semantic meanings of opcode sequences can be leveraged to infer similarities between groups of opcode and operand sequences. This empowers a probabilistic finite state machine to learn statistically significant opcode and operand sequences in a training corpus of disassemblies. The similarities demonstrate the statistical significance of opcodes and operands in a surrounding context, facilitating more accurate disassembly of new binaries. Empirical results demonstrate that the algorithm is more efficient and effective than comparable approaches used by state-of-the-art disassembly tools.

The research reported herein was supported in part by AFOSR awards FA9550-12-1-0082 & FA9550-10-1-0088, NIH awards 1R0-1LM009989 & 1R01HG006844, NSF awards #1054629, Career-CNS-0845803, CNS-0964350, CNS-1016343, CNS-1111529, & CNS-1228198, ARO award W911NF-12-1-0558, and ONR award N00014-14-1-0030.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Wartell, R., Zhou, Y., Hamlen, K.W., Kantarcioglu, M., Thuraisingham, B.: Differentiating code from data in x86 binaries. In: Gunopulos, D., Hofmann, T., Malerba, D., Vazirgiannis, M. (eds.) ECML PKDD 2011, Part III. LNCS, vol. 6913, pp. 522–536. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  2. Krishnamoorthy, N., Debray, S., Fligg, K.: Static detection of disassembly errors. In: Proceedings of the 16th Working Conference on Reverse Engineering (WCRE), pp. 259–268 (2009)

    Google Scholar 

  3. Eagle, C.: The IDA Pro Book: The Unofficial Guide to the World’s Most Popular Disassembler. No Starch Press, Inc., San Francisco (2008)

    Google Scholar 

  4. Hex-Rays: The IDA Pro disassembler and debugger, http://www.hex-rays.com/idapro

  5. GNU Project.: Gnu binary utilities (2012), http://sourceware.org/binutils/docs-2.22/binutils/index.html

  6. Schwarz, B., Debray, S., Andrews, G.: Disassembly of executable code revisited. In: Proceedings of the 9th Working Conference on Reverse Engineering (WCRE), pp. 45–54 (2002)

    Google Scholar 

  7. Intel: Intel\(^{\hbox{\scriptsize\textregistered}}\) architecture software developer’s manual (2011), http://www.intel.com/design/intarch/manuals/243191.htm

  8. Vidal, E., Thollard, F., de la Higuera, C., Casacuberta, F., Carrasco, R.: Probabilistic finite-state machines – part I. IEEE Transactions on Pattern Analysis and Machine Intelligence 27(7), 1013–1025 (2005)

    Article  Google Scholar 

  9. Vidal, E., Thollard, F., de la Higuera, C., Casacuberta, F., Carrasco, R.: Probabilistic finite-state machines – part II. IEEE Transactions on Pattern Analysis and Machine Intelligence 27(7), 1026–1039 (2005)

    Article  Google Scholar 

  10. Invisigoth of KenShoto: Visipedia, http://visi.kenshoto.com

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Wartell, R., Zhou, Y., Hamlen, K.W., Kantarcioglu, M. (2014). Shingled Graph Disassembly: Finding the Undecideable Path. In: Tseng, V.S., Ho, T.B., Zhou, ZH., Chen, A.L.P., Kao, HY. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2014. Lecture Notes in Computer Science(), vol 8443. Springer, Cham. https://doi.org/10.1007/978-3-319-06608-0_23

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-06608-0_23

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-06607-3

  • Online ISBN: 978-3-319-06608-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics