Bandwidth-Aware Adaptive Codec for DNN Inference Offloading in IoT

Xie, Xiufeng; Zhou, Ning; Zhu, Wentao; Liu, Ji

doi:10.1007/978-3-031-19839-7_6

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13698))

Included in the following conference series:

European Conference on Computer Vision

2247 Accesses
1 Citations

Abstract

The lightweight nature of IoT devices makes it challenging to run deep neural networks (DNNs) locally for applications like augmented reality. Recent advances in IoT communication like LTE-M have significantly boosted the link bandwidth, enabling IoT devices to stream visual data to edge servers running DNNs for inference. However, uncompressed visual data can still easily overload the IoT link, and the wireless spectrum is shared by numerous IoT devices, causing unstable link bandwidth. Mainstream codecs can reduce the traffic but at the cost of severe inference accuracy drops. Recent works on differentiable JPEG train the codec to tackle the damage to inference accuracy. But they rely on heuristic configurations in the loss function to balance the rate-accuracy tradeoff, providing no guarantee to meet the IoT bandwidth constraint. This paper presents AutoJPEG, a bandwidth-aware adaptive compression solution that learns the JPEG encoding parameters to optimize the DNN inference accuracy under bandwidth constraints. We model the compressed image size as a closed-form function of encoding parameters by analyzing the JPEG codec workflow. Furthermore, we formulate a constrained optimization framework to minimize the original DNN loss while ensuring the image size strictly meets the bandwidth constraint. Our evaluation validates AutoJPEG on various DNN models and datasets. In our experiments, AutoJPEG outperforms the mainstream codecs (like JPEG and WebP) and the state-of-the-art solutions that optimize the image codec for DNN inference.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Although the max function is also non-differentiable, gradients can still pass through the max function during backward propagation (just like ReLU).

References

Bengio, Y., Léonard, N., Courville, A.: Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation. arXiv preprint arXiv:1308.3432 (2013)
Choi, J., Han, B.: Task-aware quantization network for JPEG image compression. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12365, pp. 309–324. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58565-5_19
Chapter Google Scholar
Cordts, M., et al.: The cityscapes dataset for semantic urban scene understanding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3213–3223 (2016). https://www.cityscapes-dataset.com/
Courbariaux, M., Bengio, Y., David, J.P.: BinaryConnect: training deep neural networks with binary weights during propagations. In: Advances in Neural Information Processing Systems (2015)
Google Scholar
Courbariaux, M., Hubara, I., Soudry, D., El-Yaniv, R., Bengio, Y.: Binarized neural networks: training deep neural networks with weights and activations constrained to + 1 or -1. arXiv preprint arXiv:1602.02830 (2016)
Fairman, H., Brill, M., Hemmendinger, H.: How the cie 1931 color-matching functions were derived from wright-guild data. Color. Res. Appl. 22(1), 11–23 (1997)
Article Google Scholar
Google: an image format for the web (2021). https://developers.google.com/speed/webp/
He, K., Zhang, X., Ren, S., Sun, J.: Deep Residual Learning for Image Recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
He, Y., Kang, G., Dong, X., Fu, Y., Yang, Y.: Soft filter pruning for accelerating deep convolutional neural networks. In: Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI-18, pp. 2234–2240. International Joint Conferences on Artificial Intelligence Organization (2018). https://doi.org/10.24963/ijcai.2018/309
He, Y., Lin, J., Liu, Z., Wang, H., Li, L.-J., Han, S.: AMC: AutoML for model compression and acceleration on mobile devices. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11211, pp. 815–832. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01234-2_48
Chapter Google Scholar
He, Y., Zhang, X., Sun, J.: Channel pruning for accelerating very deep neural networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1389–1397 (2017)
Google Scholar
Hiriart-Urruty, J.B., Lemaréchal, C.: Convex analysis and minimization algorithms I: fundamentals, vol. 305. Springer Science & Business Media (2013). https://doi.org/10.1007/978-3-662-02796-7
Hoglund, A., et al.: Overview of 3GPP release 14 further enhanced MTC. IEEE Commun. Stand. Mag. 2(2), 84–89 (2018)
Article Google Scholar
Hoglund, A., et al.: Overview of 3GPP release 14 enhanced NB-IoT. IEEE Network 31(6), 16–22 (2017)
Article Google Scholar
Hoymann, C., et al.: LTE release 14 outlook. IEEE Commun. Mag. 54(6), 44–49 (2016)
Article Google Scholar
Hu, C., Bao, W., Wang, D., Liu, F.: Dynamic adaptive dnn surgery for inference acceleration on the edge. In: IEEE INFOCOM 2019-IEEE Conference on Computer Communications, pp. 1423–1431. IEEE (2019)
Google Scholar
Hubara, I., Courbariaux, M., Soudry, D., El-Yaniv, R., Bengio, Y.: Binarized neural networks. In: Advances in Neural Information Processing Systems (2016)
Google Scholar
ITUR: BT 601: studio encoding parameters of digital television for standard 4: 3 and wide-screen 16: 9 aspect ratios. ITU-R Rec. BT 656 (1995)
Google Scholar
Jacob, B., et al.: Quantization and training of neural networks for efficient integer-arithmetic-only inference. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2704–2713 (2018)
Google Scholar
Kang, Y., et al.: Neurosurgeon: collaborative intelligence between the cloud and mobile edge. ACM SIGARCH Comput. Archit. News 45(1), 615–629 (2017)
Article Google Scholar
Krizhevsky, A., et al.: Learning Multiple Layers of Features from Tiny Images. Tech. rep., University of Toronto (2009). https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz
Li, F., Zhang, B., Liu, B.: Ternary Weight Networks. arXiv preprint arXiv:1605.04711 (2016)
Li, H., Kadav, A., Durdanovic, I., Samet, H., Graf, P.H.: Pruning filters for efficient convNets. In: International Conference on Learning Representations (2017)
Google Scholar
Lin, M., et al.: HRank: filter pruning using high-rank feature map. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1529–1538 (2020)
Google Scholar
Liu, Z., et al.: DeepN-JPEG: a deep neural network favorable JPEG-based image compression framework. In: Proceedings of the 55th Annual Design Automation Conference, pp. 1–6 (2018)
Google Scholar
Luo, X., Talebi, H., Yang, F., Elad, M., Milanfar, P.: The rate-distortion-accuracy tradeoff: JPEG case study. arXiv preprint arXiv:2008.00605 (2020)
Ma, X., et al.: PCONV: the missing but desirable sparsity in dnn weight pruning for real-time execution on mobile devices. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 5117–5124 (2020)
Google Scholar
Peng, H., Wu, J., Chen, S., Huang, J.: Collaborative channel pruning for deep networks. In: International Conference on Machine Learning, pp. 5113–5122 (2019)
Google Scholar
Rastegari, M., Ordonez, V., Redmon, J., Farhadi, A.: XNOR-Net: imagenet classification using binary convolutional neural networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 525–542. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0_32
Chapter Google Scholar
Richardson, I.E.: H. 264 and MPEG-4 video compression: video coding for next-generation multimedia. John Wiley & Sons (2004)
Google Scholar
Roelofs, G., Koman, R.: PNG: the definitive guide. O’Reilly & Associates, Inc. (1999)
Google Scholar
Shin, R., Song, D.: JPEG-resistant adversarial images. In: NIPS 2017 Workshop on Machine Learning and Computer Security, vol. 1 (2017)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Wallace, G.K.: The JPEG still picture compression standard. In: IEEE Transactions on Consumer Electronics (1992)
Google Scholar
Wang, Y., et al.: Pruning from scratch. In: AAAI Conference on Artificial Intelligence (2020)
Google Scholar
Xie, X., Kim, K.H.: Source compression with bounded dnn perception loss for IoT edge computer vision. In: The 25th Annual International Conference on Mobile Computing and Networking, pp. 1–16 (2019)
Google Scholar
Yang, H., Zhu, Y., Liu, J.: ECC: platform-independent energy-constrained deep neural network compression via a bilinear regression model. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 11206–11215 (2019)
Google Scholar
Yang, J., et al.: Quantization networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7308–7316 (2019)
Google Scholar
Yu, F., Koltun, V., Funkhouser, T.: Dilated residual networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 472–480 (2017)
Google Scholar
Zhang, C., Karjauv, A., Benz, P., Kweon, I.S.: Towards robust data hiding against (jpeg) compression: a pseudo-differentiable deep learning approach. arXiv preprint arXiv:2101.00973 (2020)
Zhu, C., Han, S., Mao, H., Dally, W.J.: Trained ternary quantization. arXiv preprint arXiv:1612.01064 (2016)
Zhuang, Z., et al.: Discrimination-aware channel pruning for deep neural networks. In: Advances in Neural Information Processing Systems, pp. 875–886 (2018)
Google Scholar

Download references

Author information

Authors and Affiliations

Kwai Inc, New York, USA
Xiufeng Xie & Ji Liu
Amazon, Seattle, USA
Ning Zhou & Wentao Zhu

Authors

Xiufeng Xie
View author publications
You can also search for this author in PubMed Google Scholar
Ning Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Wentao Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Ji Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiufeng Xie .

Editor information

Editors and Affiliations

Tel Aviv University, Tel Aviv, Israel
Shai Avidan
University College London, London, UK
Gabriel Brostow
Google AI, Accra, Ghana
Moustapha Cissé
University of Catania, Catania, Italy
Giovanni Maria Farinella
Facebook (United States), Menlo Park, CA, USA
Tal Hassner

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Xie, X., Zhou, N., Zhu, W., Liu, J. (2022). Bandwidth-Aware Adaptive Codec for DNN Inference Offloading in IoT. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13698. Springer, Cham. https://doi.org/10.1007/978-3-031-19839-7_6

Download citation

DOI: https://doi.org/10.1007/978-3-031-19839-7_6
Published: 23 October 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-19838-0
Online ISBN: 978-3-031-19839-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Bandwidth-Aware Adaptive Codec for DNN Inference Offloading in IoT