Common Component in Black-Boxes Is Prone to Attacks

Zhang, Jiyi; Tann, Wesley Joon-Wie; Chang, Ee-Chien; Lee, Hwee Kuan

doi:10.1007/978-3-030-88418-5_28

Jiyi Zhang¹¹,
Wesley Joon-Wie Tann¹¹,
Ee-Chien Chang¹¹ &
…
Hwee Kuan Lee^{11,12,13,14,15,16}

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 12972))

Included in the following conference series:

European Symposium on Research in Computer Security

3446 Accesses

Abstract

Neural network models are getting increasingly complex. Large models are often modular, consisting of multiple separate sharable components. The development of such components may require specific domain knowledge, intensive computation power, and large datasets. Therefore, there is a high incentive for companies to keep these components proprietary. However, when a common component is included in multiple black-box models, it could potentially provide another attack vector and weaken security. In this paper, we present a method that “extracts” the common component from black-box models, using only limited resources. With a small number of data samples, an attacker can (1) obtain accurate information about the shared component, stealing propriety information of the intellectual property, and (2) utilize this component to train new tasks or execute subsequent attacks such as model cloning, class inversion, and adversarial attacks more effectively. Comprehensive experiments demonstrate that our proposed method successfully extracts the common component through hard-label and black-box access only. Moreover, the consequent attacks are also effective against straightforward defenses that introduce noise and dummy classifiers.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
We assume the adversary does not have the resources to get a large number of training data.
2.
https://github.com/timesler/facenet-pytorch.
3.
https://github.com/clovaai/voxceleb_trainer.

References

Brendel, W., Rauber, J., Bethge, M.: Decision-based adversarial attacks: reliable attacks against black-box machine learning models. In: ICLR (2018)
Google Scholar
Cao, Q., Shen, L., Xie, W., Parkhi, O.M., Zisserman, A.: Vggface2: a dataset for recognising faces across pose and age. In: FG, pp. 67–74 (2018)
Google Scholar
Carlini, N., Wagner, D.A.: Towards evaluating the robustness of neural networks. In: IEEE S&P, pp. 39–57 (2017)
Google Scholar
Chen, P., Sharma, Y., Zhang, H., Yi, J., Hsieh, C.: EAD: elastic-net attacks to deep neural networks via adversarial examples. In: AAAI, pp. 10–17 (2018)
Google Scholar
Chen, S., He, Z., Sun, C., Huang, X.: Universal adversarial attack on attention and the resulting dataset damagenet. arXiv preprint 2001.06325 (2020)
Google Scholar
Chung, J.S., et al.: In defence of metric learning for speaker recognition. In: Interspeech (2020)
Google Scholar
Deng, J., Dong, W., Socher, R., Li, L., Li, K., Li, F.: Imagenet: a large-scale hierarchical image database. In: CVPR, pp. 248–255 (2009)
Google Scholar
Dong, Y., Liao, F., Pang, T., Hu, X., Zhu, J.: Discovering adversarial examples with momentum. arXiv preprint 1710.06081 (2017)
Google Scholar
Dosovitskiy, A., Brox, T.: Generating images with perceptual similarity metrics based on deep networks. In: NIPS, pp. 658–666 (2016)
Google Scholar
Dosovitskiy, A., Brox, T.: Inverting visual representations with convolutional networks. In: CVPR, pp. 4829–4837 (2016)
Google Scholar
Fredrikson, M., Jha, S., Ristenpart, T.: Model inversion attacks that exploit confidence information and basic countermeasures. In: ACM CCS, pp. 1322–1333 (2015)
Google Scholar
Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: ICLR (2015)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. CoRR abs/1512.03385 (2015)
Google Scholar
Hinton, G.E., Roweis, S.T.: Stochastic neighbor embedding. In: NIPS, pp. 833–840 (2002)
Google Scholar
Krizhevsky, A.: Learning multiple layers of features from tiny images. Technical report (2009)
Google Scholar
Kurakin, A., Goodfellow, I.J., Bengio, S.: Adversarial examples in the physical world. In: ICLR (2017)
Google Scholar
Lee, S., Kil, R.M.: Inverse mapping of continuous functions using local and global information. IEEE Trans. Neural Netw. 5(3), 409–423 (1994)
Article Google Scholar
Lowd, D., Meek, C.: Adversarial learning. In: ACM SIGKDD, pp. 641–647 (2005)
Google Scholar
Lu, B., Kita, H., Nishikawa, Y.: Inverting feedforward neural networks using linear and nonlinear programming. IEEE Trans. Neural Netw. 10(6), 1271–1290 (1999)
Article Google Scholar
Madry, A., Makelov, A., Schmidt, L., Tsipras, D., Vladu, A.: Towards deep learning models resistant to adversarial attacks. In: ICLR (2018)
Google Scholar
Mahendran, A., Vedaldi, A.: Understanding deep image representations by inverting them. In: CVPR, pp. 5188–5196 (2015)
Google Scholar
Milli, S., Schmidt, L., Dragan, A.D., Hardt, M.: Model reconstruction from model explanations. In: FAT*, pp. 1–9 (2019)
Google Scholar
Moosavi-Dezfooli, S., Fawzi, A., Frossard, P.: Deepfool: a simple and accurate method to fool deep neural networks. In: CVPR, pp. 2574–2582 (2016)
Google Scholar
Nagrani, A., Chung, J.S., Xie, W., Zisserman, A.: Voxceleb: large-scale speaker verification in the wild. Comput. Speech Lang. 60, 101027 (2020)
Article Google Scholar
Nash, C., Kushman, N., Williams, C.K.I.: Inverting supervised representations with autoregressive neural density models. In: AISTATS, vol. 89, pp. 1620–1629 (2019)
Google Scholar
Ng, H., Winkler, S.: A data-driven approach to cleaning large face datasets. In: ICIP, pp. 343–347 (2014)
Google Scholar
Oh, S.J., Schiele, B., Fritz, M.: Towards reverse-engineering black-box neural networks. In: Samek, W., Montavon, G., Vedaldi, A., Hansen, L.K., Müller, K.-R. (eds.) Explainable AI: Interpreting, Explaining and Visualizing Deep Learning. LNCS (LNAI), vol. 11700, pp. 121–144. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-28954-6_7
Chapter Google Scholar
Orekondy, T., Schiele, B., Fritz, M.: Knockoff nets: stealing functionality of black-box models. In: CVPR, pp. 4954–4963 (2019)
Google Scholar
Panayotov, V., Chen, G., Povey, D., Khudanpur, S.: Librispeech: an ASR corpus based on public domain audio books. In: ICASSP, pp. 5206–5210 (2015)
Google Scholar
Papernot, N., McDaniel, P.D., Goodfellow, I.J., Jha, S., Celik, Z.B., Swami, A.: Practical black-box attacks against machine learning. In: AsiaCCS, pp. 506–519 (2017)
Google Scholar
Sandler, M., Howard, A.G., Zhu, M., Zhmoginov, A., Chen, L.: Mobilenetv 2: inverted residuals and linear bottlenecks. In: CVPR, pp. 4510–4520 (2018)
Google Scholar
Schroff, F., Kalenichenko, D., Philbin, J.: Facenet: a unified embedding for face recognition and clustering. In: CVPR, pp. 815–823 (2015)
Google Scholar
Szegedy, C., et al.: Intriguing properties of neural networks. In: ICLR (2014)
Google Scholar
Tramèr, F., Zhang, F., Juels, A., Reiter, M.K., Ristenpart, T.: Stealing machine learning models via prediction apis. In: USENIX (2016)
Google Scholar
Wang, B., Gong, N.Z.: Stealing hyperparameters in machine learning. In: IEEE S&P, pp. 36–52 (2018)
Google Scholar
Yang, Z., Zhang, J., Chang, E., Liang, Z.: Neural network inversion in adversarial setting via background knowledge alignment. In: ACM CCS, pp. 225–240 (2019)
Google Scholar

Download references

Acknowledgement

This research is supported by the National Research Foundation, Singapore under its Strategic Capability Research Centres Funding Initiative. Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not reflect the views of National Research Foundation, Singapore. This work is partly supported by the Biomedical Research Council of the Agency for Science, Technology, and Research, Singapore.

Author information

Authors and Affiliations

School of Computing, National University of Singapore, Singapore, Singapore
Jiyi Zhang, Wesley Joon-Wie Tann, Ee-Chien Chang & Hwee Kuan Lee
Bioinformatics Institute, A*STAR Singapore, Singapore, Singapore
Hwee Kuan Lee
Singapore Eye Research Institute (SERI), Singapore, Singapore
Hwee Kuan Lee
Image and Pervasive Access Lab (IPAL), Singapore, Singapore
Hwee Kuan Lee
Rehabilitation Research Institute of Singapore, Singapore, Singapore
Hwee Kuan Lee
Singapore Institute for Clinical Sciences, Singapore, Singapore
Hwee Kuan Lee

Authors

Jiyi Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Wesley Joon-Wie Tann
View author publications
You can also search for this author in PubMed Google Scholar
Ee-Chien Chang
View author publications
You can also search for this author in PubMed Google Scholar
Hwee Kuan Lee
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jiyi Zhang .

Editor information

Editors and Affiliations

Purdue University, West Lafayette, IN, USA
Elisa Bertino
National Research Center for Applied Cybersecurity ATHENE, Fraunhofer Institute for Secure Information Technology SIT, Darmstadt, Germany
Haya Shulman
National Research Center for Applied Cybersecurity ATHENE , Technische Universität Darmstadt, Fraunhofer Institute for Secure Information Technology SIT, Darmstadt, Germany
Michael Waidner

A Evaluation on Time Series Audio Data and Speaker Classification

The proposed strategy is generic and can be applied on different types of data and neural network architecture. We repeat a similar set of experiments as Sect. 4 on audio data and voice classification tasks.

1.1 A.1 Dataset

This evaluation was done using LibriSpeech [29] dataset. The version we use contains 100 h of English speech from 251 unique speakers. We use 100 speakers to train the victim embedding classifiers and use 100 speakers to attack the victims. The remaining speakers are reserved for analysis.

1.2 A.2 Model Setup

We choose a SpeakerNet model [6] trained on development set of VoxCeleb2 [24] which contains 145,569 voice recordings of 5,994 speakers with data augmentation, as the victim embedder. The weights are directly obtained from GitHub repository^{Footnote 3}. For embedding classifiers, we also use simple models with only two fully connected layers and same train and test splitting as Sect. 4. Similar to our evaluation on image dataset, we also test a set of different combinations of settings for victims. During the attack, to construct the tree-like substitute, we use the ResNet34Half [13] as the trunk and shallow fully connected networks as branches.

1.3 A.3 Attack Process

We query the victim classifiers as black-boxes. We use 9,061 unlabeled voice recordings from 100 speakers which have no overlap with the training data of victims for the attack. The number of recordings we use is around 6.22% of the original dataset used to training the victim embedder and the number of speakers we use is around 1.67% of the original dataset. For all combination of settings, we train 100 epochs and save the best models.

1.4 A.4 Benchmark of Embedder Extraction

Clustering Capability of Embedder (Q1). We visualize the clustering capability of the extracted embedders using 964 voice recordings of 10 speakers from a separate testing dataset. Here we are using the embedders extracted from victims with 10 classes. The embeddings generated by the original embedder and extracted embedder are projected to 2D space using t-Distributed Stochastic Neighbor Embedding (t-SNE) [14]. We also try training from scratch using the same amount of data we used to query the victim model. However, the amount of data is too little to generate any meaningful result.

In Fig. 6, we can see that embedders extracted from multiple victim classifiers indeed have significantly better clustering capabilities. The embedder extracted from a single victim performs poorly.

Degree of Distance Preservation (Q2). Here we did a similar experiment as in Sect. 4.4 to compute ratio of pairwise distances among embeddings generated by both the extracted embedders and victim embedder. We use 964 voice recordings of 10 speakers for this experiment.

In Fig. 7, we plot the distribution of distance ratio for 3 embedders, each extracted from 1, 5, 10 victim classifiers of 10 classes respectively. Extracting from more victim models yields much smaller dispersion, indicating the distances are better preserved.

1.5 A.5 Performance in Attack Scenarios

Training a Composite Model for New Task (S1). We evaluate the performance of the extracted embedders when used in new voice classification tasks. Here we use the embedders we extracted from victims with 10 classes. They are the models visualized in Fig. 6(a)(b)(c). In Table 8, we can see performance of the embedder increases with the amount of victims available for extraction, and decreases with the number of classes.

Table 8. Accuracy of classification when using extracted embedders to create embedding.

Full size table

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, J., Tann, W.JW., Chang, EC., Lee, H.K. (2021). Common Component in Black-Boxes Is Prone to Attacks. In: Bertino, E., Shulman, H., Waidner, M. (eds) Computer Security – ESORICS 2021. ESORICS 2021. Lecture Notes in Computer Science(), vol 12972. Springer, Cham. https://doi.org/10.1007/978-3-030-88418-5_28

Download citation

DOI: https://doi.org/10.1007/978-3-030-88418-5_28
Published: 30 September 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-88417-8
Online ISBN: 978-3-030-88418-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Common Component in Black-Boxes Is Prone to Attacks

Abstract

Access this chapter

Notes

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

A Evaluation on Time Series Audio Data and Speaker Classification

A Evaluation on Time Series Audio Data and Speaker Classification

1.1 A.1 Dataset

1.2 A.2 Model Setup

1.3 A.3 Attack Process

1.4 A.4 Benchmark of Embedder Extraction

1.5 A.5 Performance in Attack Scenarios

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation