Jackdaw: Towards Automatic Reverse Engineering of Large Datasets of Binaries

Polino, Mario; Scorti, Andrea; Maggi, Federico; Zanero, Stefano

doi:10.1007/978-3-319-20550-2_7

Mario Polino¹⁶,
Andrea Scorti¹⁶,
Federico Maggi¹⁶ &
…
Stefano Zanero¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 9148))

Included in the following conference series:

International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment

3011 Accesses
13 Citations

Abstract

When analyzing an untrusted binary, reverse engineers usually rely on ad-hoc collections of interesting dynamic patterns—known as behaviors in the malware-analysis community—and static patterns—known as signatures in the antivirus community. Such patterns are often part of the skill set of the analyst, sometimes implemented in manually-created post-processing scripts. It would be desirable to be able to automatically find such behaviors, present them to analysts, and create a systematic catalog of matching rules and relevant implementations. We propose Jackdaw, a system that finds interesting dynamic patterns, and ranks them to unveil potentially interesting behaviors. Then, it annotates them with static information, capturing the distinct implementations of each across different malware families. Finally, Jackdaw associates semantic information to the behaviors, so as to create a descriptive summary that helps the analysts in querying the catalog of behaviors by type. To do this, it leverages the dynamic information and an indexed Web-based knowledge databases.

We implement and demonstrate Jackdaw on the Win32 API (even if the technique can be generalized to any OS). On a dataset of 2,136 distinct binaries, including both malicious and benign libraries and executables, we compared the behaviors extracted automatically against a ground truth of 44 behaviors created manually by expert analysts. Jackdaw found 77.3 % of them and was able to exclude spurious behaviors in 99.6 % cases. We also discovered 466 novel behaviors, among which manual exploration and review by expert reverse engineers revealed interesting findings and confirmed the correctness of the semantic tagging.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Bayer, U., Comparetti, P.M., Hlauschek, C., Kruegel, C., Kirda, E.: Scalable, behavior-based malware clustering. In: NDSS (2009)
Google Scholar
Bayer, U., Habibi, I., Balzarotti, D., Kirda, E., Kruegel, C.: Insights into current malware behavior. In: LEET (2009)
Google Scholar
Caselden, D., Bazhanyuk, A., Payer, M., McCamant, S., Song, D.: HI-CFG: construction by binary analysis and application to attack polymorphism. In: Crampton, J., Jajodia, S., Mayes, K. (eds.) ESORICS 2013. LNCS, vol. 8134, pp. 164–181. Springer, Heidelberg (2013)
Chapter Google Scholar
Cesare, S., Xiang, Y.: Software Similarity and Classification. Springer Briefs in Computer Science. Springer, London (2012)
Book MATH Google Scholar
Cesare, S., Xiang, Y., Zhou, W.: Control flow-based malware variant detection. IEEE Trans. Dependable Secure Comput. 11(4), 307–317 (2014). doi:10.1109/TDSC.2013.40
Article Google Scholar
Comparetti, P.M., Salvaneschi, G., Kirda, E., Kolbitsch, C., Kruegel, C., Zanero, S.: Identifying dormant functionality in malware programs. In: SP, pp. 61–76. IEEE Computer Society, Washington, DC (2010)
Google Scholar
Crandall, J.R., Wu, S.F., Chong, F.T.: Minos: architectural support for protecting control data. TACO 3(4), 359–389 (2006)
Article Google Scholar
Deng, Z., Zhang, X., Xu, D.: Spider: stealthy binary program instrumentation and debugging via hardware virtualization. In: ACSAC, New York, NY, USA (2013)
Google Scholar
Dolan-Gavitt, B., Leek, T., Zhivich, M., Giffin, J., Lee, W.: Virtuoso: narrowing the semantic gap in virtual machine introspection. In: SP, pp. 297–312 (2011)
Google Scholar
Eskandari, M., Khorshidpour, Z., Hashemi, S.: Hdm-analyser: a hybrid analysis approach based on data mining techniques for malware detection. JCV 9(2), 77–93 (2013)
Google Scholar
Fredrikson, M., Jha, S., Christodorescu, M., Sailer, R., Yan, X.: Synthesizing near-optimal malware specifications from suspicious behaviors. In: SP, pp. 45–60. IEEE Computer Society, Washington, DC (2010)
Google Scholar
Fu, Y., Lin, Z.: Space traveling across vm: automatically bridging the semantic gap in virtual machine introspection via online kernel data redirection. In: SP, pp. 586–600 (2012)
Google Scholar
Garfinkel, T., Adams, K., Warfield, A., Franklin, J.: Compatibility is not transparency: Vmm detection myths and realities. In: HOTOS, pp. 6:1–6:6. USENIX Association, Berkeley (2007)
Google Scholar
Holz, T., Raynal, F.: Detecting honeypots and other suspicious environments. In: 6th IEEE SMC Information Assurance Workshop (2005)
Google Scholar
Jacob, G., Comparetti, P.M., Neugschwandtner, M., Kruegel, C., Vigna, G.: A static, packer-agnostic filter to detect similar malware samples. In: Flegel, U., Markatos, E., Robertson, W. (eds.) DIMVA 2012. LNCS, vol. 7591, pp. 102–122. Springer, Heidelberg (2013)
Chapter Google Scholar
Jacob, G., Debar, H., Filiol, E.: Behavioral detection of malware: from a survey towards an established taxonomy. JCV 4(3), 251–266 (2008)
Google Scholar
Jang, J., Woo, M., Brumley, D.: Towards automatic software lineage inference. In: USENIX Security, pp. 81–96. USENIX Association, Berkeley (2013)
Google Scholar
Kirat, D., Vigna, G., Kruegel, C.: Barebox: efficient malware analysis on bare-metal. In: ACSAC, pp. 403–412. ACM, New York (2011)
Google Scholar
Kruegel, C., Kirda, E., Mutz, D., Robertson, W., Vigna, G.: Polymorphic worm detection using structural information of executables. In: Valdes, A., Zamboni, D. (eds.) RAID 2005. LNCS, vol. 3858, pp. 207–226. Springer, Heidelberg (2006)
Chapter Google Scholar
Lee, J., Avgerinos, T., Brumley, D.: Tie: principled reverse engineering of types in binary programs. In: NDSS (2011)
Google Scholar
Lindorfer, M., Federico, A.D., Maggi, F., Comparetti, P.M., Zanero, S.: Lines of malicious code: insights into the malicious software industry. In: ACSAC, pp. 349–358. ACM, New York (2012)
Google Scholar
Lindorfer, M., Kolbitsch, C., Milani Comparetti, P.: Detecting environment-sensitive malware. In: Sommer, R., Balzarotti, D., Maier, G. (eds.) RAID 2011. LNCS, vol. 6961, pp. 338–357. Springer, Heidelberg (2011)
Chapter Google Scholar
Linn, C., Debray, S.: Obfuscation of executable code to improve resistance to static disassembly. In: CCS, pp. 290–299. ACM, New York (2003)
Google Scholar
Maggi, F., Matteucci, M., Zanero, S.: Detecting intrusions through system call sequence and argument analysis. TODS 7(4), 381–395 (2008)
Google Scholar
Martignoni, L., Christodorescu, M., Jha, S.: Omniunpack: fast, generic, and safe unpacking of malware. In: ACSAC, pp. 431–441. IEEE (2007)
Google Scholar
Martignoni, L., Stinson, E., Fredrikson, M., Jha, S., Mitchell, J.C.: A layered architecture for detecting malicious behaviors. In: Lippmann, R., Kirda, E., Trachtenberg, A. (eds.) RAID 2008. LNCS, vol. 5230, pp. 78–97. Springer, Heidelberg (2008)
Chapter Google Scholar
Moser, A., Kruegel, C., Kirda, E.: Exploring multiple execution paths for malware analysis. In: SP (2007)
Google Scholar
Moser, A., Kruegel, C., Kirda, E.: Limits of static analysis for malware detection. In: ACSAC, pp. 421–430 (2007)
Google Scholar
Mutz, D., Valeur, F., Vigna, G., Kruegel, C.: Anomalous system call detection. TISSEC 9(1), 61–93 (2006)
Article Google Scholar
Nance, K., Bishop, M., Hay, B.: Virtual machine introspection: observation or interference? IEEE Secur. Priv. 6(5), 32–37 (2008)
Article Google Scholar
Newsome, J.: Dynamic taint analysis for automatic detection, analysis, and signature generation of exploits on commodity software. In: NDSS. Internet Society (2005)
Google Scholar
Palahan, S., Babic, D., Chaudhuri, S., Kifer, D.: Extraction of statistically signicant malware behaviors. In: ACSAC, New York, NY, USA, December 2013
Google Scholar
Rieck, K., Trinius, P., Willems, C., Holz, T.: Automatic analysis of malware behavior using machine learning. JCS 19(4), 639–668 (2011)
Google Scholar
Royal, P., Halpin, M., Dagon, D., Edmonds, R., Lee, W.: Polyunpack: automating the hidden-code extraction of unpack-executing malware. In: ACSAC, pp. 289–300. IEEE Computer Society, Washington, DC (2006)
Google Scholar
Schwartz, E.J., Lee, J., Woo, M., Brumley, D.: Native x86 decompilation using semantics-preserving structural analysis and iterative control-flow structuring. In: USENIX Security (2013)
Google Scholar
Slowinska, A., Stancescu, T., Bos, H.: Howard: a dynamic excavator for reverse engineering data structures. In: NDSS. Citeseer (2011)
Google Scholar
Song, D., Brumley, D., Yin, H., Caballero, J., Jager, I., Kang, M.G., Liang, Z., Newsome, J., Poosankam, P., Saxena, P.: BitBlaze: a new approach to computer security via binary analysis. In: Sekar, R., Pujari, A.K. (eds.) ICISS 2008. LNCS, vol. 5352, pp. 1–25. Springer, Heidelberg (2008)
Chapter Google Scholar
Song, Q., Kasabov, N.: Ecm - a novel on-line, evolving clustering method and its applications. In: Posner, M.I. (ed.) Foundations of Cognitive Science, pp. 631–682. The MIT Press, Cambridge (2001)
Google Scholar
Willems, C., Hund, R., Fobian, A., Felsch, D., Holz, T., Vasudevan, A.: Down to the bare metal: using processor features for binary analysis. In: ACSAC, pp. 189–198. ACM, New York (2012)
Google Scholar
Yan, G., Brown, N., Kong, D.: Exploring discriminatory features for automated malware classification. In: Rieck, K., Stewin, P., Seifert, J.-P. (eds.) DIMVA 2013. LNCS, vol. 7967, pp. 41–61. Springer, Heidelberg (2013)
Chapter Google Scholar
Yetiser, T.: Polymorphic Viruses, Implementation, Detection, and Protection (1993)
Google Scholar
Yin, H., Song, D.X., Egele, M., Kruegel, C., Kirda, E.: Panorama: capturing system-wide information flow for malware detection and analysis. In: Ning, P., di Vimercati, S.D.C., Syverson, P.F. (eds.) CCS, pp. 116–127. ACM, New York (2007)
Google Scholar

Download references

Author information

Authors and Affiliations

DEIB, Politecnico di Milano, Milan, Italy
Mario Polino, Andrea Scorti, Federico Maggi & Stefano Zanero

Authors

Mario Polino
View author publications
You can also search for this author in PubMed Google Scholar
Andrea Scorti
View author publications
You can also search for this author in PubMed Google Scholar
Federico Maggi
View author publications
You can also search for this author in PubMed Google Scholar
Stefano Zanero
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mario Polino .

Editor information

Editors and Affiliations

Chalmers University of Technology, Gothenburg, Sweden
Magnus Almgren
Chalmers University of Technology, Gothenburg, Sweden
Vincenzo Gulisano
Politecnico di Milano, Milan, Italy
Federico Maggi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Polino, M., Scorti, A., Maggi, F., Zanero, S. (2015). Jackdaw: Towards Automatic Reverse Engineering of Large Datasets of Binaries. In: Almgren, M., Gulisano, V., Maggi, F. (eds) Detection of Intrusions and Malware, and Vulnerability Assessment. DIMVA 2015. Lecture Notes in Computer Science(), vol 9148. Springer, Cham. https://doi.org/10.1007/978-3-319-20550-2_7

Download citation

DOI: https://doi.org/10.1007/978-3-319-20550-2_7
Published: 23 June 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-20549-6
Online ISBN: 978-3-319-20550-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics