MABNet: A Lightweight Stereo Network Based on Multibranch Adjustable Bottleneck Module

Xing, Jiabin; Qi, Zhi; Dong, Jiying; Cai, Jiaxuan; Liu, Hao

doi:10.1007/978-3-030-58604-1_21

Jiabin Xing¹²,
Zhi Qi¹²,
Jiying Dong¹²,
Jiaxuan Cai¹² &
…
Hao Liu¹²

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12373))

Included in the following conference series:

European Conference on Computer Vision

3235 Accesses
4 Citations

Abstract

Recently, end-to-end CNNs have presented remarkable performance for disparity estimation. But most of them are too heavy to resource-constrained devices, because of enormous parameters necessary for satisfactory results. To address the issue, we propose two compact stereo networks, MABNet and its light version MABNet_tiny. MABNet is based on a novel Multibranch Adjustable Bottleneck (MAB) module, which is less demanding on parameters and computation. In a MAB module, feature map is split into various parallel branches, where the depthwise separable convolutions with different dilation rates extract features with multiple receptive fields however at an affordable computational budget. Besides, the number of channels in each branch is adjustable independently to tradeoff computation and accuracy. On SceneFlow and KITTI datasets, our MABNet achieves competitive accuracy with fewer parameters of 1.65M. Especially, MABNet_tiny reduces the parameters 47K by cutting down the channels and layers in MABNet.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Alhaija, H.A., Mustikovela, S.K., Mescheder, L., Geiger, A., Rother, C.: Augmented reality meets computer vision: efficient data generation for urban driving scenes. Int. J. Comput. Vis. 126(9), 961–972 (2018)
Article Google Scholar
Badrinarayanan, V., Kendall, A., Cipolla, R.: Segnet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39(12), 2481–2495 (2017)
Article Google Scholar
Batsos, K., Mordohai, P.: Recresnet: a recurrent residual CNN architecture for disparity map enhancement. In: 2018 International Conference on 3D Vision (3DV), pp. 238–247. IEEE (2018)
Google Scholar
Chang, J.R., Chen, Y.S.: Pyramid stereo matching network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5410–5418 (2018)
Google Scholar
Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: Semantic image segmentation with deep convolutional nets and fully connected crfs. arXiv preprint arXiv:1412.7062 (2014)
Chollet, F.: Xception: deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017)
Google Scholar
Denton, E.L., Zaremba, W., Bruna, J., LeCun, Y., Fergus, R.: Exploiting linear structure within convolutional networks for efficient evaluation. In: Advances in Neural Information Processing Systems, pp. 1269–1277 (2014)
Google Scholar
Dosovitskiy, Aet al.: Flownet: Llarning optical flow with convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2758–2766 (2015)
Google Scholar
Du, X., El-Khamy, M., Lee, J.: Amnet: deep atrous multiscale stereo disparity estimation networks. arXiv preprint arXiv:1904.09099 (2019)
Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? the kitti vision benchmark suite. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 3354–3361. IEEE (2012)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Hirschmuller, H.: Stereo processing by semiglobal matching and mutual information. IEEE Trans. Pattern Anal. Mach. Intell. 30(2), 328–341 (2007)
Article Google Scholar
Holschneider, M., Kronland-Martinet, R., Morlet, J., Tchamitchian, P.: A real-time algorithm for signal analysis with the help of the wavelet transform. In: Combes, J.M., Grossmann, A., Tchamitchian, P. (eds.) Wavelets, inverse problems and theoretical imaging, pp. 286–297. Springer, Heidelberg (1990). https://doi.org/10.1007/978-3-642-75988-8_28
Chapter Google Scholar
Howard, A.G., et al.: Efficient convolutional neural networks for mobile vision applications. arXiv preprint ArXiv:1704.0486 (2017)
Iandola, F.N., Han, S., Moskewicz, M.W., Ashraf, K., Dally, W.J., Keutzer, K.: Squeezenet: Alexnet-level accuracy with 50\(\times \) fewer parameters and \(<\) 0.5 mb model size. arXiv preprint arXiv:1602.07360 (2016)
Jie, Z., et al.: Left-right comparative recurrent model for stereo matching. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3838–3846 (2018)
Google Scholar
Kendall, A., et al.: End-to-end learning of geometry and context for deep stereo regression. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 66–75 (2017)
Google Scholar
Khamis, S., Fanello, S., Rhemann, C., Kowdle, A., Valentin, J., Izadi, S.: Stereonet: guided hierarchical refinement for real-time edge-aware depth prediction. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 573–590 (2018)
Google Scholar
Lee, K.J., et al.: A 502-gops and 0.984-mw dual-mode intelligent adas soc with real-time semiglobal matching and intention prediction for smart automotive black box system. IEEE J. Solid-State Circ. 52(1), 139–150 (2016)
Google Scholar
Li, Z., et al.: 3.7 a 1920 \(\times \) 1080 30fps 2.3 tops/w stereo-depth processor for robust autonomous navigation. In: 2017 IEEE International Solid-State Circuits Conference (ISSCC), pp. 62–63. IEEE (2017)
Google Scholar
Liang, Z., et al.: Learning for disparity estimation through feature constancy. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2811–2820 (2018)
Google Scholar
Liu, S., De Mello, S., Gu, J., Zhong, G., Yang, M.H., Kautz, J.: Learning affinity via spatial propagation networks. In: Advances in Neural Information Processing Systems, pp. 1520–1530 (2017)
Google Scholar
Ma, N., Zhang, X., Zheng, H.T., Sun, J.: Shufflenet v2: practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018)
Google Scholar
Mayer, N., et al.: A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4040–4048 (2016)
Google Scholar
Mehta, S., Rastegari, M., Caspi, A., Shapiro, L., Hajishirzi, H.: Espnet: efficient spatial pyramid of dilated convolutions for semantic segmentation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 552–568 (2018)
Google Scholar
Menze, M., Heipke, C., Geiger, A.: Joint 3D estimation of vehicles and scene flow. ISPRS Ann. Photogram. Remote Sens. Spat. Inf. Sci. 2, 1–8 (2015)
Google Scholar
Pang, J., Sun, W., Ren, J.S., Yang, C., Yan, Q.: Cascade residual learning: a two-stage convolutional neural network for stereo matching. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 887–895 (2017)
Google Scholar
Poggi, M., Mattoccia, S.: Learning from scratch a confidence measure. In: BMVC (2016)
Google Scholar
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Chapter Google Scholar
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.C.: Mobilenetv 2: Inverted residuals and linear bottlenecks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4510–4520 (2018)
Google Scholar
Seki, A., Pollefeys, M.: Patch based confidence prediction for dense disparity map. In: BMVC, vol. 2, p. 4 (2016)
Google Scholar
Shen, S.: Accurate multiple view 3D reconstruction using patch-based stereo for large-scale scenes. IEEE Trans. Image Process. 22(5), 1901–1914 (2013)
Article MathSciNet Google Scholar
Szegedy, C., et al.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)
Google Scholar
Tonioni, A., Tosi, F., Poggi, M., Mattoccia, S., Stefano, L.D.: Real-time self-adaptive deep stereo. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 195–204 (2019)
Google Scholar
Tulyakov, S., Ivanov, A., Fleuret, F.: Practical deep stereo (pds): toward applications-friendly deep stereo matching. In: Advances in Neural Information Processing Systems, pp. 5871–5881 (2018)
Google Scholar
Wang, H., Lin, J., Wang, Z.: Design light-weight 3D convolutional networks for video recognition temporal residual, fully separable block, and fast algorithm. arXiv preprint arXiv:1905.13388 (2019)
Wang, P., et al.: Understanding convolution for semantic segmentation. In: 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1451–1460. IEEE (2018)
Google Scholar
Wang, Y., et al.: Anytime stereo image depth estimation on mobile devices. In: 2019 International Conference on Robotics and Automation (ICRA), pp. 5893–5900. IEEE (2019)
Google Scholar
Wofk, D., Ma, F., Yang, T.J., Karaman, S., Sze, V.: Fastdepth: fast monocular depth estimation on embedded systems. In: 2019 International Conference on Robotics and Automation (ICRA), pp. 6101–6108. IEEE (2019)
Google Scholar
Xie, C.W., Zhou, H.Y., Wu, J.: Vortex pooling: improving context representation in semantic segmentation. arXiv preprint arXiv:1804.06242 (2018)
Xie, S., Girshick, R., Dollár, P., Tu, Z., He, K.: Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1492–1500 (2017)
Google Scholar
Xie, S., Sun, C., Huang, J., Tu, Z., Murphy, K.: Rethinking spatiotemporal feature learning: speed-accuracy trade-offs in video classification. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 305–321 (2018)
Google Scholar
Xu, X., Hou, Y., Wang, P., Jiang, Z., Li, W.: Light weight stereo matching via deep extraction and integration of low and high level information. In: 2019 IEEE International Conference on Multimedia and Expo (ICME), pp. 320–325. IEEE (2019)
Google Scholar
Žbontar, J., LeCun, Y.: Stereo matching by training a convolutional neural network to compare image patches. J. Mach. Learn. Res. 17(1), 2287–2318 (2016)
MATH Google Scholar
Zhang, F., Prisacariu, V., Yang, R., Torr, P.H.: Ga-net: guided aggregation net for end-to-end stereo matching. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 185–194 (2019)
Google Scholar
Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: an extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018)
Google Scholar

Download references

Acknowledgement

This research was supported by the Key Science and Technology Projects in Jiangsu Province (Grant No. BE2018002-2) and the National Nature Foundation of China (Grant No.61974024).

Author information

Authors and Affiliations

National ASIC System Engineering Research Center, Southeast University, Nanjing, China
Jiabin Xing, Zhi Qi, Jiying Dong, Jiaxuan Cai & Hao Liu

Authors

Jiabin Xing
View author publications
You can also search for this author in PubMed Google Scholar
Zhi Qi
View author publications
You can also search for this author in PubMed Google Scholar
Jiying Dong
View author publications
You can also search for this author in PubMed Google Scholar
Jiaxuan Cai
View author publications
You can also search for this author in PubMed Google Scholar
Hao Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhi Qi .

Editor information

Editors and Affiliations

University of Oxford, Oxford, UK
Andrea Vedaldi
Graz University of Technology, Graz, Austria
Horst Bischof
University of Freiburg, Freiburg im Breisgau, Germany
Thomas Brox
University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
Jan-Michael Frahm

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Xing, J., Qi, Z., Dong, J., Cai, J., Liu, H. (2020). MABNet: A Lightweight Stereo Network Based on Multibranch Adjustable Bottleneck Module. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12373. Springer, Cham. https://doi.org/10.1007/978-3-030-58604-1_21

Download citation

DOI: https://doi.org/10.1007/978-3-030-58604-1_21
Published: 03 November 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58603-4
Online ISBN: 978-3-030-58604-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics