Skip to main content

Jackdaw: Towards Automatic Reverse Engineering of Large Datasets of Binaries

  • Conference paper
  • First Online:
Detection of Intrusions and Malware, and Vulnerability Assessment (DIMVA 2015)

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 9148))

Abstract

When analyzing an untrusted binary, reverse engineers usually rely on ad-hoc collections of interesting dynamic patterns—known as behaviors in the malware-analysis community—and static patterns—known as signatures in the antivirus community. Such patterns are often part of the skill set of the analyst, sometimes implemented in manually-created post-processing scripts. It would be desirable to be able to automatically find such behaviors, present them to analysts, and create a systematic catalog of matching rules and relevant implementations. We propose Jackdaw, a system that finds interesting dynamic patterns, and ranks them to unveil potentially interesting behaviors. Then, it annotates them with static information, capturing the distinct implementations of each across different malware families. Finally, Jackdaw associates semantic information to the behaviors, so as to create a descriptive summary that helps the analysts in querying the catalog of behaviors by type. To do this, it leverages the dynamic information and an indexed Web-based knowledge databases.

We implement and demonstrate Jackdaw on the Win32 API (even if the technique can be generalized to any OS). On a dataset of 2,136 distinct binaries, including both malicious and benign libraries and executables, we compared the behaviors extracted automatically against a ground truth of 44 behaviors created manually by expert analysts. Jackdaw found 77.3 % of them and was able to exclude spurious behaviors in 99.6 % cases. We also discovered 466 novel behaviors, among which manual exploration and review by expert reverse engineers revealed interesting findings and confirmed the correctness of the semantic tagging.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://stackoverflow.com/questions/3281260/send-mail-through-gmail-smtp-server-using-win-api.

  2. 2.

    https://gist.github.com/anonymous/6129d822af1bf299ca8a.

  3. 3.

    http://anubis.iseclab.org.

  4. 4.

    http://www.cwsandbox.org.

  5. 5.

    http://cuckoosandbox.org.

References

  1. Bayer, U., Comparetti, P.M., Hlauschek, C., Kruegel, C., Kirda, E.: Scalable, behavior-based malware clustering. In: NDSS (2009)

    Google Scholar 

  2. Bayer, U., Habibi, I., Balzarotti, D., Kirda, E., Kruegel, C.: Insights into current malware behavior. In: LEET (2009)

    Google Scholar 

  3. Caselden, D., Bazhanyuk, A., Payer, M., McCamant, S., Song, D.: HI-CFG: construction by binary analysis and application to attack polymorphism. In: Crampton, J., Jajodia, S., Mayes, K. (eds.) ESORICS 2013. LNCS, vol. 8134, pp. 164–181. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  4. Cesare, S., Xiang, Y.: Software Similarity and Classification. Springer Briefs in Computer Science. Springer, London (2012)

    Book  MATH  Google Scholar 

  5. Cesare, S., Xiang, Y., Zhou, W.: Control flow-based malware variant detection. IEEE Trans. Dependable Secure Comput. 11(4), 307–317 (2014). doi:10.1109/TDSC.2013.40

    Article  Google Scholar 

  6. Comparetti, P.M., Salvaneschi, G., Kirda, E., Kolbitsch, C., Kruegel, C., Zanero, S.: Identifying dormant functionality in malware programs. In: SP, pp. 61–76. IEEE Computer Society, Washington, DC (2010)

    Google Scholar 

  7. Crandall, J.R., Wu, S.F., Chong, F.T.: Minos: architectural support for protecting control data. TACO 3(4), 359–389 (2006)

    Article  Google Scholar 

  8. Deng, Z., Zhang, X., Xu, D.: Spider: stealthy binary program instrumentation and debugging via hardware virtualization. In: ACSAC, New York, NY, USA (2013)

    Google Scholar 

  9. Dolan-Gavitt, B., Leek, T., Zhivich, M., Giffin, J., Lee, W.: Virtuoso: narrowing the semantic gap in virtual machine introspection. In: SP, pp. 297–312 (2011)

    Google Scholar 

  10. Eskandari, M., Khorshidpour, Z., Hashemi, S.: Hdm-analyser: a hybrid analysis approach based on data mining techniques for malware detection. JCV 9(2), 77–93 (2013)

    Google Scholar 

  11. Fredrikson, M., Jha, S., Christodorescu, M., Sailer, R., Yan, X.: Synthesizing near-optimal malware specifications from suspicious behaviors. In: SP, pp. 45–60. IEEE Computer Society, Washington, DC (2010)

    Google Scholar 

  12. Fu, Y., Lin, Z.: Space traveling across vm: automatically bridging the semantic gap in virtual machine introspection via online kernel data redirection. In: SP, pp. 586–600 (2012)

    Google Scholar 

  13. Garfinkel, T., Adams, K., Warfield, A., Franklin, J.: Compatibility is not transparency: Vmm detection myths and realities. In: HOTOS, pp. 6:1–6:6. USENIX Association, Berkeley (2007)

    Google Scholar 

  14. Holz, T., Raynal, F.: Detecting honeypots and other suspicious environments. In: 6th IEEE SMC Information Assurance Workshop (2005)

    Google Scholar 

  15. Jacob, G., Comparetti, P.M., Neugschwandtner, M., Kruegel, C., Vigna, G.: A static, packer-agnostic filter to detect similar malware samples. In: Flegel, U., Markatos, E., Robertson, W. (eds.) DIMVA 2012. LNCS, vol. 7591, pp. 102–122. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  16. Jacob, G., Debar, H., Filiol, E.: Behavioral detection of malware: from a survey towards an established taxonomy. JCV 4(3), 251–266 (2008)

    Google Scholar 

  17. Jang, J., Woo, M., Brumley, D.: Towards automatic software lineage inference. In: USENIX Security, pp. 81–96. USENIX Association, Berkeley (2013)

    Google Scholar 

  18. Kirat, D., Vigna, G., Kruegel, C.: Barebox: efficient malware analysis on bare-metal. In: ACSAC, pp. 403–412. ACM, New York (2011)

    Google Scholar 

  19. Kruegel, C., Kirda, E., Mutz, D., Robertson, W., Vigna, G.: Polymorphic worm detection using structural information of executables. In: Valdes, A., Zamboni, D. (eds.) RAID 2005. LNCS, vol. 3858, pp. 207–226. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  20. Lee, J., Avgerinos, T., Brumley, D.: Tie: principled reverse engineering of types in binary programs. In: NDSS (2011)

    Google Scholar 

  21. Lindorfer, M., Federico, A.D., Maggi, F., Comparetti, P.M., Zanero, S.: Lines of malicious code: insights into the malicious software industry. In: ACSAC, pp. 349–358. ACM, New York (2012)

    Google Scholar 

  22. Lindorfer, M., Kolbitsch, C., Milani Comparetti, P.: Detecting environment-sensitive malware. In: Sommer, R., Balzarotti, D., Maier, G. (eds.) RAID 2011. LNCS, vol. 6961, pp. 338–357. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  23. Linn, C., Debray, S.: Obfuscation of executable code to improve resistance to static disassembly. In: CCS, pp. 290–299. ACM, New York (2003)

    Google Scholar 

  24. Maggi, F., Matteucci, M., Zanero, S.: Detecting intrusions through system call sequence and argument analysis. TODS 7(4), 381–395 (2008)

    Google Scholar 

  25. Martignoni, L., Christodorescu, M., Jha, S.: Omniunpack: fast, generic, and safe unpacking of malware. In: ACSAC, pp. 431–441. IEEE (2007)

    Google Scholar 

  26. Martignoni, L., Stinson, E., Fredrikson, M., Jha, S., Mitchell, J.C.: A layered architecture for detecting malicious behaviors. In: Lippmann, R., Kirda, E., Trachtenberg, A. (eds.) RAID 2008. LNCS, vol. 5230, pp. 78–97. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  27. Moser, A., Kruegel, C., Kirda, E.: Exploring multiple execution paths for malware analysis. In: SP (2007)

    Google Scholar 

  28. Moser, A., Kruegel, C., Kirda, E.: Limits of static analysis for malware detection. In: ACSAC, pp. 421–430 (2007)

    Google Scholar 

  29. Mutz, D., Valeur, F., Vigna, G., Kruegel, C.: Anomalous system call detection. TISSEC 9(1), 61–93 (2006)

    Article  Google Scholar 

  30. Nance, K., Bishop, M., Hay, B.: Virtual machine introspection: observation or interference? IEEE Secur. Priv. 6(5), 32–37 (2008)

    Article  Google Scholar 

  31. Newsome, J.: Dynamic taint analysis for automatic detection, analysis, and signature generation of exploits on commodity software. In: NDSS. Internet Society (2005)

    Google Scholar 

  32. Palahan, S., Babic, D., Chaudhuri, S., Kifer, D.: Extraction of statistically signicant malware behaviors. In: ACSAC, New York, NY, USA, December 2013

    Google Scholar 

  33. Rieck, K., Trinius, P., Willems, C., Holz, T.: Automatic analysis of malware behavior using machine learning. JCS 19(4), 639–668 (2011)

    Google Scholar 

  34. Royal, P., Halpin, M., Dagon, D., Edmonds, R., Lee, W.: Polyunpack: automating the hidden-code extraction of unpack-executing malware. In: ACSAC, pp. 289–300. IEEE Computer Society, Washington, DC (2006)

    Google Scholar 

  35. Schwartz, E.J., Lee, J., Woo, M., Brumley, D.: Native x86 decompilation using semantics-preserving structural analysis and iterative control-flow structuring. In: USENIX Security (2013)

    Google Scholar 

  36. Slowinska, A., Stancescu, T., Bos, H.: Howard: a dynamic excavator for reverse engineering data structures. In: NDSS. Citeseer (2011)

    Google Scholar 

  37. Song, D., Brumley, D., Yin, H., Caballero, J., Jager, I., Kang, M.G., Liang, Z., Newsome, J., Poosankam, P., Saxena, P.: BitBlaze: a new approach to computer security via binary analysis. In: Sekar, R., Pujari, A.K. (eds.) ICISS 2008. LNCS, vol. 5352, pp. 1–25. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  38. Song, Q., Kasabov, N.: Ecm - a novel on-line, evolving clustering method and its applications. In: Posner, M.I. (ed.) Foundations of Cognitive Science, pp. 631–682. The MIT Press, Cambridge (2001)

    Google Scholar 

  39. Willems, C., Hund, R., Fobian, A., Felsch, D., Holz, T., Vasudevan, A.: Down to the bare metal: using processor features for binary analysis. In: ACSAC, pp. 189–198. ACM, New York (2012)

    Google Scholar 

  40. Yan, G., Brown, N., Kong, D.: Exploring discriminatory features for automated malware classification. In: Rieck, K., Stewin, P., Seifert, J.-P. (eds.) DIMVA 2013. LNCS, vol. 7967, pp. 41–61. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  41. Yetiser, T.: Polymorphic Viruses, Implementation, Detection, and Protection (1993)

    Google Scholar 

  42. Yin, H., Song, D.X., Egele, M., Kruegel, C., Kirda, E.: Panorama: capturing system-wide information flow for malware detection and analysis. In: Ning, P., di Vimercati, S.D.C., Syverson, P.F. (eds.) CCS, pp. 116–127. ACM, New York (2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mario Polino .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Polino, M., Scorti, A., Maggi, F., Zanero, S. (2015). Jackdaw: Towards Automatic Reverse Engineering of Large Datasets of Binaries. In: Almgren, M., Gulisano, V., Maggi, F. (eds) Detection of Intrusions and Malware, and Vulnerability Assessment. DIMVA 2015. Lecture Notes in Computer Science(), vol 9148. Springer, Cham. https://doi.org/10.1007/978-3-319-20550-2_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-20550-2_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-20549-6

  • Online ISBN: 978-3-319-20550-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics