Skip to main content

Automatic Annotation of Confidential Data in Java Code

  • Conference paper
  • First Online:
Foundations and Practice of Security (FPS 2021)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13291))

Included in the following conference series:

  • 988 Accesses

Abstract

The problem of confidential information leak can be addressed by using automatic tools that take a set of annotated inputs (the source) and track their flow to public sinks. Unfortunately, manually annotating the code with labels specifying the secret sources is one of the main obstacles in the adoption of such trackers.

In this work, we present an approach for the automatic generation of labels for confidential data in Java programs. Our solution is based on a graph-based representation of Java methods: starting from a minimal set of known API calls, it propagates the labels both intra- and inter-procedurally until a fix-point is reached.

In our evaluation, we encode our synthesis and propagation algorithm in Datalog and assess the accuracy of our technique on seven previously annotated internal code bases, where we can reconstruct 75% of the pre-existing manual annotations. In addition to this single data point, we also perform an assessment using samples from the SecuriBench-micro benchmark, and we provide additional sample programs that demonstrate the capabilities and the limitations of our approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 49.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 64.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Checker framework. https://checkerframework.org/manual/

  2. Doop framework. https://bitbucket.org/yanniss/doop/src/master/

  3. Java Vulnerability Detection. https://labs.oracle.com/pls/apex/f?p=labs:49:::::P49_PROJECT_ID:122

  4. MUDetect. https://github.com/stg-tud/MUDetect

  5. SecuriBench-micro. https://github.com/too4words/securibench-micro

  6. Soufflé. https://souffle-lang.github.io

  7. Amann, S., Nguyen, H.A., Nadi, S., Nguyen, T.N., Mezini, M.: Investigating next steps in static API-misuse detection. In: MSR 2019, 26–27 May 2019, Montreal, Canada (2019)

    Google Scholar 

  8. Arzt, S., et al.: Flowdroid: precise context, flow, field, object-sensitive and lifecycle-aware taint analysis for android apps. In: PLDI 2014, Edinburgh, United Kingdom, 09–11 June 2014, pp. 259–269 (2014)

    Google Scholar 

  9. Broberg, N., van Delft, B., Sands, D.: Paragon - practical programming with information flow control. J. Comput. Secur. 25(4–5), 323–365 (2017)

    Article  Google Scholar 

  10. Buiras, P., Vytiniotis, D., Russo, A.: HLIO: mixing static and dynamic typing for information-flow control in haskell. In: Proceedings of the 20th ACM SIGPLAN International Conference on Functional Programming, ICFP 2015, Vancouver, BC, Canada, 1–3 September 2015, pp. 289–301 (2015)

    Google Scholar 

  11. Christakis, M., Bird, C.: What developers want and need from program analysis: an empirical study. In: Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering, pp. 332–343 (2016)

    Google Scholar 

  12. ECMA International: Standard ECMA-262 - ECMAScript Language Specification. 5.1 edn, June 2011

    Google Scholar 

  13. Enck, W., et al.: Taintdroid: an information-flow tracking system for realtime privacy monitoring on smartphones. In: Proceedings of 9th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2010, 4–6 October 2010, Vancouver, BC, Canada, pp. 393–407 (2010)

    Google Scholar 

  14. Hammer, C., Snelting, G.: Flow-sensitive, context-sensitive, and object-sensitive information flow control based on program dependence graphs. Int. J. Inf. Secur. 8(6), 399–422 (2009)

    Article  Google Scholar 

  15. Hedin, D., Birgisson, A., Bello, L., Sabelfeld, A.: JSFlow: tracking information flow in JavaScript and its APIs. In: SAC (2014)

    Google Scholar 

  16. Hedin, D., Sabelfeld, A.: A perspective on information-flow control. In: Software Safety and Security - Tools for Analysis and Verification, pp. 319–347 (2012)

    Google Scholar 

  17. Li, B., Ma, R., Wang, X., Wang, X., He, J.: DepTaint: a static taint analysis method based on program dependence. In: Proceedings of the 2020 4th International Conference on Management Engineering, Software Engineering and Service Sciences, pp. 34–41 (2020)

    Google Scholar 

  18. Livshits, V.B., Nori, A.V., Rajamani, S.K., Banerjee, A.: Merlin: specification inference for explicit information flow problems. In: PLDI 2009, Dublin, Ireland, 15–21 June 2009, pp. 75–86 (2009)

    Google Scholar 

  19. Mover, S., Sankaranarayanan, S., Olsen, R.B.P., Chang, B.E.: Mining framework usage graphs from app corpora. In: 25th International Conference on Software Analysis, Evolution and Reengineering, SANER 2018, Campobasso, Italy, 20–23 March 2018 (2018)

    Google Scholar 

  20. Myers, A.C., Zheng, L., Zdancewic, S., Chong, S., Nystrom, N.: Jif 3.0: Java information flow, July 2006. http://www.cs.cornell.edu/jif

  21. Nguyen, T.T., Nguyen, H.A., Pham, N.H., Al-Kofahi, J.M., Nguyen, T.N.: Graph-based mining of multiple object usage patterns. In: ESEC/FSE, 2009, Amsterdam, The Netherlands, 24–28 August 2009 (2009)

    Google Scholar 

  22. Pottier, F., Simonet, V.: Information flow inference for ML. In: Conference Record of POPL 2002: The 29th SIGPLAN-SIGACT Symposium on Principles of Programming Languages, Portland, OR, USA, 16–18 January 2002, pp. 319–330 (2002)

    Google Scholar 

  23. Sabelfeld, A., Russo, A.: From dynamic to static and back: riding the roller coaster of information-flow control research. In: Pnueli, A., Virbitskaite, I., Voronkov, A. (eds.) PSI 2009. LNCS, vol. 5947, pp. 352–365. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-11486-1_30

    Chapter  Google Scholar 

  24. Schoepe, D., Balliu, M., Pierce, B.C., Sabelfeld, A.: Explicit secrecy: a policy for taint tracking. In: IEEE European Symposium on Security and Privacy, EuroS&P 2016, Saarbrücken, Germany, 21–24 March 2016, pp. 15–30 (2016)

    Google Scholar 

  25. Schwartz, E.J., Avgerinos, T., Brumley, D.: All you ever wanted to know about dynamic taint analysis and forward symbolic execution (but might have been afraid to ask). In: 31st IEEE Symposium on Security and Privacy, S&P 2010, 16–19 May 2010, Berleley/Oakland, California, USA, pp. 317–331 (2010)

    Google Scholar 

  26. Stefan, D., Russo, A., Mitchell, J.C., Mazières, D.: Flexible dynamic information flow control in haskell. In: Proceedings of the 4th ACM SIGPLAN Symposium on Haskell, Haskell 2011, Tokyo, Japan, 22 September 2011, pp. 95–106 (2011)

    Google Scholar 

  27. Zhu, H., Dillig, T., Dillig, I.: Automated inference of library specifications for source-sink property verification. In: Shan, C. (ed.) APLAS 2013. LNCS, vol. 8301, pp. 290–306. Springer, Cham (2013). https://doi.org/10.1007/978-3-319-03542-0_21

    Chapter  Google Scholar 

Download references

Acknowledgments

This work was partially supported by the Wallenberg AI, Autonomous Systems and Software Program (WASP) funded by the Knut and Alice Wallenberg Foundation.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Iulia Bastys .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Bastys, I., Bolignano, P., Raimondi, F., Schoepe, D. (2022). Automatic Annotation of Confidential Data in Java Code. In: Aïmeur, E., Laurent, M., Yaich, R., Dupont, B., Garcia-Alfaro, J. (eds) Foundations and Practice of Security. FPS 2021. Lecture Notes in Computer Science, vol 13291. Springer, Cham. https://doi.org/10.1007/978-3-031-08147-7_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-08147-7_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-08146-0

  • Online ISBN: 978-3-031-08147-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics