Attention-Aware Feature Aggregation for Real-Time Stereo Matching on Edge Devices

Chang, Jia-Ren; Chang, Pei-Chun; Chen, Yong-Sheng

doi:10.1007/978-3-030-69525-5_22

Jia-Ren Chang^12,13,
Pei-Chun Chang¹² &
Yong-Sheng Chen¹²

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12622))

Included in the following conference series:

Asian Conference on Computer Vision

1127 Accesses
6 Citations

Abstract

Recent works have demonstrated superior results for depth estimation from a stereo pair of images using convolutional neural networks. However, these methods require large amounts of computational resources and are not suited to real-time applications on edge devices. In this work, we propose a novel method for real-time stereo matching on edge devices, which consists of an efficient backbone for feature extraction, an attention-aware feature aggregation, and a cascaded 3D CNN architecture for multi-scale disparity estimation. The efficient backbone is designed to generate multi-scale feature maps with constrained computational power. The multi-scale feature maps are further adaptively aggregated via the proposed attention-aware feature aggregation module to improve representational capacity of features. Multi-scale cost volumes are constructed using aggregated feature maps and regularized using a cascaded 3D CNN architecture to estimate disparity maps in anytime settings. The network infers a disparity map at low resolution and then progressively refines the disparity maps at higher resolutions by calculating the disparity residuals. Because of the efficient extraction and aggregation of informative features, the proposed method can achieve accurate depth estimation in real-time inference. Experimental results demonstrated that the proposed method processed stereo image pairs with resolution 1242 \(\times \) 375 at 12–33 fps on an NVIDIA Jetson TX2 module and achieved competitive accuracy in depth estimation. The code is available at https://github.com/JiaRenChang/RealtimeStereo.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Chen, X., et al.: 3D object proposals for accurate object class detection. In: Advances in Neural Information Processing Systems, pp. 424–432 (2015)
Google Scholar
Zhang, C., Li, Z., Cheng, Y., Cai, R., Chao, H., Rui, Y.: MeshStereo: a global stereo model with mesh alignment regularization for view interpolation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2057–2065 (2015)
Google Scholar
Scharstein, D., Szeliski, R.: A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. Int. J. Comput. Vision 47, 7–42 (2002)
Article Google Scholar
Zbontar, J., LeCun, Y.: Stereo matching by training a convolutional neural network to compare image patches. J. Mach. Learn. Res. 17, 2 (2016)
MATH Google Scholar
Shaked, A., Wolf, L.: Improved stereo matching with constant highway networks and reflective confidence learning. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Google Scholar
Seki, A., Pollefeys, M.: SGM-Nets: semi-global matching with neural networks. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Google Scholar
Kendall, A., et al.: End-to-end learning of geometry and context for deep stereo regression. In: The IEEE International Conference on Computer Vision (ICCV) (2017)
Google Scholar
Chang, J.R., Chen, Y.S.: Pyramid stereo matching network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5410–5418 (2018)
Google Scholar
Zhang, F., Prisacariu, V., Yang, R., Torr, P.H.: GA-Net: guided aggregation net for end-to-end stereo matching. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 185–194 (2019)
Google Scholar
Hirschmuller, H.: Accurate and efficient stereo processing by semi-global matching and mutual information. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2005. CVPR 2005, vol. 2, pp. 807–814. IEEE (2005)
Google Scholar
Xu, H., Zhang, J.: AANet: adaptive aggregation network for efficient stereo matching. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1959–1968 (2020)
Google Scholar
Duggal, S., Wang, S., Ma, W.C., Hu, R., Urtasun, R.: DeepPruner: learning efficient stereo matching via differentiable PatchMatch. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4384–4393 (2019)
Google Scholar
Wang, Y., et al.: Anytime stereo image depth estimation on mobile devices. In: International Conference on Robotics and Automation (ICRA) (2019)
Google Scholar
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.C.: MobileNetV2: inverted residuals and linear bottlenecks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4510–4520 (2018)
Google Scholar
Haase, D., Amthor, M.: Rethinking depthwise separable convolutions: how intra-kernel correlations lead to improved MobileNets. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14600–14609 (2020)
Google Scholar
Ma, N., Zhang, X., Zheng, H.T., Sun, J.: ShuffleNet V2: practical guidelines for efficient CNN architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018)
Google Scholar
Sun, D., Yang, X., Liu, M.Y., Kautz, J.: PWC-Net: CNNs for optical flow using pyramid, warping, and cost volume. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8934–8943 (2018)
Google Scholar
Mayer, N., et al.: A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
Google Scholar
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167 (2015)
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Chapter Google Scholar
Sun, K., Xiao, B., Liu, D., Wang, J.: Deep high-resolution representation learning for human pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5693–5703 (2019)
Google Scholar
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018)
Google Scholar
Menze, M., Geiger, A.: Object scene flow for autonomous vehicles. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3061–3070 (2015)
Google Scholar
Khamis, S., Fanello, S., Rhemann, C., Kowdle, A., Valentin, J., Izadi, S.: StereoNet: guided hierarchical refinement for real-time edge-aware depth prediction. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 573–590 (2018)
Google Scholar
Yin, Z., Darrell, T., Yu, F.: Hierarchical discrete distribution decomposition for match density estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6044–6053 (2019)
Google Scholar
Guo, X., Yang, K., Yang, W., Wang, X., Li, H.: Group-wise correlation stereo network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3273–3282 (2019)
Google Scholar
Tonioni, A., Tosi, F., Poggi, M., Mattoccia, S., Stefano, L.D.: Real-time self-adaptive deep stereo. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 195–204 (2019)
Google Scholar
Schops, T., et al.: A multi-view stereo benchmark with high-resolution images and multi-camera videos. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3260–3269 (2017)
Google Scholar

Download references

Acknowledgments

This work was supported in part by the Taiwan Ministry of Science and Technology (Grants MOST-109-2218-E-002-038 and MOST-109-2634-F-009-015), Pervasive Artificial Intelligence Research (PAIR) Labs, Qualcomm Technologies Inc., and the Center for Emergent Functional Matter Science of National Chiao Tung University from the Featured Areas Research Center Program within the framework of the Higher Education Sprout Project by the Ministry of Education in Taiwan.

Author information

Authors and Affiliations

Department of Computer Science, National Chiao Tung University, Hsinchu, Taiwan
Jia-Ren Chang, Pei-Chun Chang & Yong-Sheng Chen
aetherAI, Taipei, Taiwan
Jia-Ren Chang

Authors

Jia-Ren Chang
View author publications
You can also search for this author in PubMed Google Scholar
Pei-Chun Chang
View author publications
You can also search for this author in PubMed Google Scholar
Yong-Sheng Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yong-Sheng Chen .

Editor information

Editors and Affiliations

Waseda University, Tokyo, Japan
Hiroshi Ishikawa
Institute of Automation of Chinese Academy of Sciences, Beijing, China
Cheng-Lin Liu
Czech Technical University in Prague, Prague, Czech Republic
Tomas Pajdla
University of Pennsylvania, Philadelphia, PA, USA
Jianbo Shi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chang, JR., Chang, PC., Chen, YS. (2021). Attention-Aware Feature Aggregation for Real-Time Stereo Matching on Edge Devices. In: Ishikawa, H., Liu, CL., Pajdla, T., Shi, J. (eds) Computer Vision – ACCV 2020. ACCV 2020. Lecture Notes in Computer Science(), vol 12622. Springer, Cham. https://doi.org/10.1007/978-3-030-69525-5_22

Download citation

DOI: https://doi.org/10.1007/978-3-030-69525-5_22
Published: 27 February 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-69524-8
Online ISBN: 978-3-030-69525-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics