Abstract
Neural network models are getting increasingly complex. Large models are often modular, consisting of multiple separate sharable components. The development of such components may require specific domain knowledge, intensive computation power, and large datasets. Therefore, there is a high incentive for companies to keep these components proprietary. However, when a common component is included in multiple black-box models, it could potentially provide another attack vector and weaken security. In this paper, we present a method that “extracts” the common component from black-box models, using only limited resources. With a small number of data samples, an attacker can (1) obtain accurate information about the shared component, stealing propriety information of the intellectual property, and (2) utilize this component to train new tasks or execute subsequent attacks such as model cloning, class inversion, and adversarial attacks more effectively. Comprehensive experiments demonstrate that our proposed method successfully extracts the common component through hard-label and black-box access only. Moreover, the consequent attacks are also effective against straightforward defenses that introduce noise and dummy classifiers.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
We assume the adversary does not have the resources to get a large number of training data.
- 2.
- 3.
References
Brendel, W., Rauber, J., Bethge, M.: Decision-based adversarial attacks: reliable attacks against black-box machine learning models. In: ICLR (2018)
Cao, Q., Shen, L., Xie, W., Parkhi, O.M., Zisserman, A.: Vggface2: a dataset for recognising faces across pose and age. In: FG, pp. 67–74 (2018)
Carlini, N., Wagner, D.A.: Towards evaluating the robustness of neural networks. In: IEEE S&P, pp. 39–57 (2017)
Chen, P., Sharma, Y., Zhang, H., Yi, J., Hsieh, C.: EAD: elastic-net attacks to deep neural networks via adversarial examples. In: AAAI, pp. 10–17 (2018)
Chen, S., He, Z., Sun, C., Huang, X.: Universal adversarial attack on attention and the resulting dataset damagenet. arXiv preprint 2001.06325 (2020)
Chung, J.S., et al.: In defence of metric learning for speaker recognition. In: Interspeech (2020)
Deng, J., Dong, W., Socher, R., Li, L., Li, K., Li, F.: Imagenet: a large-scale hierarchical image database. In: CVPR, pp. 248–255 (2009)
Dong, Y., Liao, F., Pang, T., Hu, X., Zhu, J.: Discovering adversarial examples with momentum. arXiv preprint 1710.06081 (2017)
Dosovitskiy, A., Brox, T.: Generating images with perceptual similarity metrics based on deep networks. In: NIPS, pp. 658–666 (2016)
Dosovitskiy, A., Brox, T.: Inverting visual representations with convolutional networks. In: CVPR, pp. 4829–4837 (2016)
Fredrikson, M., Jha, S., Ristenpart, T.: Model inversion attacks that exploit confidence information and basic countermeasures. In: ACM CCS, pp. 1322–1333 (2015)
Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: ICLR (2015)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. CoRR abs/1512.03385 (2015)
Hinton, G.E., Roweis, S.T.: Stochastic neighbor embedding. In: NIPS, pp. 833–840 (2002)
Krizhevsky, A.: Learning multiple layers of features from tiny images. Technical report (2009)
Kurakin, A., Goodfellow, I.J., Bengio, S.: Adversarial examples in the physical world. In: ICLR (2017)
Lee, S., Kil, R.M.: Inverse mapping of continuous functions using local and global information. IEEE Trans. Neural Netw. 5(3), 409–423 (1994)
Lowd, D., Meek, C.: Adversarial learning. In: ACM SIGKDD, pp. 641–647 (2005)
Lu, B., Kita, H., Nishikawa, Y.: Inverting feedforward neural networks using linear and nonlinear programming. IEEE Trans. Neural Netw. 10(6), 1271–1290 (1999)
Madry, A., Makelov, A., Schmidt, L., Tsipras, D., Vladu, A.: Towards deep learning models resistant to adversarial attacks. In: ICLR (2018)
Mahendran, A., Vedaldi, A.: Understanding deep image representations by inverting them. In: CVPR, pp. 5188–5196 (2015)
Milli, S., Schmidt, L., Dragan, A.D., Hardt, M.: Model reconstruction from model explanations. In: FAT*, pp. 1–9 (2019)
Moosavi-Dezfooli, S., Fawzi, A., Frossard, P.: Deepfool: a simple and accurate method to fool deep neural networks. In: CVPR, pp. 2574–2582 (2016)
Nagrani, A., Chung, J.S., Xie, W., Zisserman, A.: Voxceleb: large-scale speaker verification in the wild. Comput. Speech Lang. 60, 101027 (2020)
Nash, C., Kushman, N., Williams, C.K.I.: Inverting supervised representations with autoregressive neural density models. In: AISTATS, vol. 89, pp. 1620–1629 (2019)
Ng, H., Winkler, S.: A data-driven approach to cleaning large face datasets. In: ICIP, pp. 343–347 (2014)
Oh, S.J., Schiele, B., Fritz, M.: Towards reverse-engineering black-box neural networks. In: Samek, W., Montavon, G., Vedaldi, A., Hansen, L.K., Müller, K.-R. (eds.) Explainable AI: Interpreting, Explaining and Visualizing Deep Learning. LNCS (LNAI), vol. 11700, pp. 121–144. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-28954-6_7
Orekondy, T., Schiele, B., Fritz, M.: Knockoff nets: stealing functionality of black-box models. In: CVPR, pp. 4954–4963 (2019)
Panayotov, V., Chen, G., Povey, D., Khudanpur, S.: Librispeech: an ASR corpus based on public domain audio books. In: ICASSP, pp. 5206–5210 (2015)
Papernot, N., McDaniel, P.D., Goodfellow, I.J., Jha, S., Celik, Z.B., Swami, A.: Practical black-box attacks against machine learning. In: AsiaCCS, pp. 506–519 (2017)
Sandler, M., Howard, A.G., Zhu, M., Zhmoginov, A., Chen, L.: Mobilenetv 2: inverted residuals and linear bottlenecks. In: CVPR, pp. 4510–4520 (2018)
Schroff, F., Kalenichenko, D., Philbin, J.: Facenet: a unified embedding for face recognition and clustering. In: CVPR, pp. 815–823 (2015)
Szegedy, C., et al.: Intriguing properties of neural networks. In: ICLR (2014)
Tramèr, F., Zhang, F., Juels, A., Reiter, M.K., Ristenpart, T.: Stealing machine learning models via prediction apis. In: USENIX (2016)
Wang, B., Gong, N.Z.: Stealing hyperparameters in machine learning. In: IEEE S&P, pp. 36–52 (2018)
Yang, Z., Zhang, J., Chang, E., Liang, Z.: Neural network inversion in adversarial setting via background knowledge alignment. In: ACM CCS, pp. 225–240 (2019)
Acknowledgement
This research is supported by the National Research Foundation, Singapore under its Strategic Capability Research Centres Funding Initiative. Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not reflect the views of National Research Foundation, Singapore. This work is partly supported by the Biomedical Research Council of the Agency for Science, Technology, and Research, Singapore.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
A Evaluation on Time Series Audio Data and Speaker Classification
A Evaluation on Time Series Audio Data and Speaker Classification
The proposed strategy is generic and can be applied on different types of data and neural network architecture. We repeat a similar set of experiments as Sect. 4 on audio data and voice classification tasks.
1.1 A.1 Dataset
This evaluation was done using LibriSpeech [29] dataset. The version we use contains 100 h of English speech from 251 unique speakers. We use 100 speakers to train the victim embedding classifiers and use 100 speakers to attack the victims. The remaining speakers are reserved for analysis.
1.2 A.2 Model Setup
We choose a SpeakerNet model [6] trained on development set of VoxCeleb2 [24] which contains 145,569 voice recordings of 5,994 speakers with data augmentation, as the victim embedder. The weights are directly obtained from GitHub repositoryFootnote 3. For embedding classifiers, we also use simple models with only two fully connected layers and same train and test splitting as Sect. 4. Similar to our evaluation on image dataset, we also test a set of different combinations of settings for victims. During the attack, to construct the tree-like substitute, we use the ResNet34Half [13] as the trunk and shallow fully connected networks as branches.
1.3 A.3 Attack Process
We query the victim classifiers as black-boxes. We use 9,061 unlabeled voice recordings from 100 speakers which have no overlap with the training data of victims for the attack. The number of recordings we use is around 6.22% of the original dataset used to training the victim embedder and the number of speakers we use is around 1.67% of the original dataset. For all combination of settings, we train 100 epochs and save the best models.
1.4 A.4 Benchmark of Embedder Extraction
Clustering Capability of Embedder (Q1). We visualize the clustering capability of the extracted embedders using 964 voice recordings of 10 speakers from a separate testing dataset. Here we are using the embedders extracted from victims with 10 classes. The embeddings generated by the original embedder and extracted embedder are projected to 2D space using t-Distributed Stochastic Neighbor Embedding (t-SNE) [14]. We also try training from scratch using the same amount of data we used to query the victim model. However, the amount of data is too little to generate any meaningful result.
In Fig. 6, we can see that embedders extracted from multiple victim classifiers indeed have significantly better clustering capabilities. The embedder extracted from a single victim performs poorly.
Degree of Distance Preservation (Q2). Here we did a similar experiment as in Sect. 4.4 to compute ratio of pairwise distances among embeddings generated by both the extracted embedders and victim embedder. We use 964 voice recordings of 10 speakers for this experiment.
In Fig. 7, we plot the distribution of distance ratio for 3 embedders, each extracted from 1, 5, 10 victim classifiers of 10 classes respectively. Extracting from more victim models yields much smaller dispersion, indicating the distances are better preserved.
1.5 A.5 Performance in Attack Scenarios
Training a Composite Model for New Task (S1). We evaluate the performance of the extracted embedders when used in new voice classification tasks. Here we use the embedders we extracted from victims with 10 classes. They are the models visualized in Fig. 6(a)(b)(c). In Table 8, we can see performance of the embedder increases with the amount of victims available for extraction, and decreases with the number of classes.
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Zhang, J., Tann, W.JW., Chang, EC., Lee, H.K. (2021). Common Component in Black-Boxes Is Prone to Attacks. In: Bertino, E., Shulman, H., Waidner, M. (eds) Computer Security – ESORICS 2021. ESORICS 2021. Lecture Notes in Computer Science(), vol 12972. Springer, Cham. https://doi.org/10.1007/978-3-030-88418-5_28
Download citation
DOI: https://doi.org/10.1007/978-3-030-88418-5_28
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-88417-8
Online ISBN: 978-3-030-88418-5
eBook Packages: Computer ScienceComputer Science (R0)