CVE-Net: cost volume enhanced network guided by sparse features for stereo matching

Xu, Qingzhen; Liu, Shuang; Huang, Guangyi; Zeng, Kun; Gong, Yongyi; Luo, Xiaonan

doi:10.1007/s00500-021-06257-4

CVE-Net: cost volume enhanced network guided by sparse features for stereo matching

Soft computing in decision making and in modeling in economics
Published: 30 September 2021

Volume 25, pages 15183–15199, (2021)
Cite this article

Soft Computing Aims and scope Submit manuscript

Qingzhen Xu¹,
Shuang Liu^1,2,
Guangyi Huang^1,2,
Kun Zeng³,
Yongyi Gong ORCID: orcid.org/0000-0002-8559-1801^2,4 &
…
Xiaonan Luo⁵

351 Accesses
1 Citation
Explore all metrics

Abstract

Deep learning based on convolutional neural network (CNN) has been successfully applied to stereo matching as it can accelerate the training process and improve the matching accuracy. However, the existing stereo matching framework based on CNN often has two problems. The first problem is the generalization ability of training model. Stereo matching frameworks are usually pre-trained on a large synthetic Scene Flow dataset and then fine-tuned on evaluation dataset. However, the evaluation dataset may contain trivial training data or even do not have disparity label for some specified tasks. This adversely affects the generality of the training model. The second problem is the poor matching performance in ill-posed regions. It is difficult to distinguish the ill-posed regions, including weak texture area, repeated texture area, occlusion area, reflection structure, and fine structure, etc. To ameliorate the aforementioned problems, we propose the cost volume enhancement network (CVE-Net) guided by sparse features for stereo matching. CVE-Net use the edge information and saliency information for sparsely sampling the precise disparity labels during training. Furthermore, we enhance the cost volume by leveraging the precise disparity sparse label information to guide the direction of training. The experiment shows that the generalization ability is significantly improved. The domain-transferring problem on the new dataset is significantly alleviated. In addition, introducing the sparse multiple semantic features improves the matching performance in the ill-posed regions. Even without fine-tuning, the matching requirements can be met. These results demonstrate the effectiveness of the CVE-Net.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

MC-DCNN: Dilated Convolutional Neural Network for Computing Stereo Matching Cost

Hierarchical Correlation Stereo Matching Network

SRC-Disp: Synthetic-Realistic Collaborative Disparity Learning for Stereo Matching

Notes

References

Canny J (1986) A computational approach to edge detection. IEEE Trans Pattern Anal Mach Intell 6:679–698
Article Google Scholar
Chang J-R, Chen Y-S (2018) Pyramid stereo matching network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5410–5418
Gadekallu TR, Alazab M, Kaluri R, Maddikunta P, Parimala M (2021) Hand gesture classification using a novel CNN-crow search algorithm. Complex & Intelligent Systems, no. 6
Geiger A, Lenz P, Urtasun R (2012) Are we ready for autonomous driving? the kitti vision benchmark suite. In: 2012 IEEE conference on computer vision and pattern recognition. IEEE, pp 3354–3361
Guo X, Yang K, Yang W, Wang X, Li H (2019) Group-wise correlation stereo network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3273–3282
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
He J, Zhang S, Yang M, Shan Y, Huang T (2019) Bi-directional cascade network for perceptual edge detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3828–3837
Hirschmuller H (2007) Stereo processing by semiglobal matching and mutual information. IEEE Trans Pattern Anal Mach Intell 30(2):328–341
Article Google Scholar
Huang G, Gong Y, Xu Q, Wattanachote K, Zeng K, Luo X (2020) A convolutional attention residual network for stereo matching. IEEE Access 8:50828–50842
Article Google Scholar
Jie Z, Wang P, Ling Y, Zhao B, Wei Y, Feng J, Liu W (2018) Left-right comparative recurrent model for stereo matching. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3838–3846
Kendall A, Martirosyan H, Dasgupta S, Henry P, Kennedy R, Bachrach A, Bry A (2017) End-to-end learning of geometry and context for deep stereo regression. In: Proceedings of the IEEE international conference on computer vision, pp 66–75
Kingma DP, Ba J (2014) Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980
Knobelreiter P, Reinbacher C, Shekhovtsov A, Pock T (2017) End-to-end training of hybrid CNN-CRF models for stereo. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2339–2348
Liang Z, Feng Y, Guo Y, Liu H, Chen W, Qiao L, Zhou L, Zhang J (2018) Learning for disparity estimation through feature constancy. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2811–2820
Liu A, Nie W, Gao Y, Su Y (2018) View-based 3-d model retrieval: a benchmark. IEEE Trans Cybern 48(3):916–928
Google Scholar
Lu C, Uchiyama H, Thomas D, Shimada A, Taniguchi R-I (2018) Sparse cost volume for efficient stereo matching. Remote Sens 10(11):1844
Article Google Scholar
Mayer N, Ilg E, Hausser P, Fischer P, Cremers D, Dosovitskiy A, Brox T (2016) A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4040–4048
Menze M, Geiger A (2015) Object scene flow for autonomous vehicles. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3061–3070
Otsu N (1979) A threshold selection method from gray-level histograms. IEEE Trans Syst Man Cybern 9(1):62–66
Article MathSciNet Google Scholar
Pang J, Sun W, Ren JS, Yang C, Yan Q (2017) Cascade residual learning: a two-stage convolutional neural network for stereo matching. In: Proceedings of the IEEE international conference on computer vision workshops, pp 887–895
Ren Y, Xie X, Li G, Wang Z (2018) A scan-line forest growing-based hand segmentation framework with multipriority vertex stereo matching for wearable devices. IEEE Trans Cybern 48(2):556–570
Article Google Scholar
Scharstein D, Szeliski R (2002) A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. Int J Comput Vis 47(1–3):7–42
Article Google Scholar
Seki A, Pollefeys M (2016) Patch based confidence prediction for dense disparity map. BMVC 2(3):4
Google Scholar
Seki A, Pollefeys M (2017) Sgm-nets: Semi-global matching with neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 231–240
Smolyanskiy N, Kamenev A, Birchfield S (2018) On the importance of stereo for accurate depth estimation: an efficient semi-supervised deep neural network approach. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 1007–1015
Song X, Zhao X, Hu H, Fang L (2018) Edgestereo: a context integrated residual pyramid network for stereo matching. Asian conference on computer vision. Springer, Berlin, pp 20–35
Google Scholar
Srivastava G, Reddy PK, Gadekallu TR, Siva SG, Ashokkumar P (2020) A two stage text feature selection algorithm for improving text classification
Tulyakov S, Ivanov A, Fleuret F (2018) Practical deep stereo (PDS): toward applications-friendly deep stereo matching. In: Advances in neural information processing systems, pp 5871–5881
Wang C, Bai X, Wang X, Liu X, Zhou J, Wu X, Li H, Tao D (2020) Self-supervised multiscale adversarial regression network for stereo disparity estimation. In: IEEE Transactions on Cybernetics, pp 1–14
Woo S, Park J, Lee J-Y, So Kweon I (2018) Cbam: Convolutional block attention module. In: Proceedings of the European conference on computer vision (ECCV), pp 3–19
Wu Z, Su L, Huang Q (2019a) Cascaded partial decoder for fast and accurate salient object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3907–3916
Wu Z, Wu X, Zhang X, Wang S, Ju L (2019) Semantic stereo matching with pyramid cost volumes. In: Proceedings of the IEEE International conference on computer vision, pp 7484–7493
Xie S, Tu Z (2015) Holistically-nested edge detection. In: Proceedings of the IEEE international conference on computer vision, pp 1395–1403
Xie S, Girshick R, Dollár P, Tu Z, He K (2017) Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1492–1500
Yang G, Zhao H, Shi J, Deng Z, Jia J (2018) Segstereo: Exploiting semantic information for disparity estimation. In: Proceedings of the European conference on computer vision (ECCV), pp 636–651
Yang G, Manela J, Happold M, Ramanan D (2019) Hierarchical deep stereo matching on high-resolution images. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5515–5524
Žbontar J, LeCun Y (2016) Stereo matching by training a convolutional neural network to compare image patches. J Mach Learn Res 17(1):2287–2318
MATH Google Scholar
Zhang F, Wah BW (2017) Fundamental principles on learning new features for effective dense matching. IEEE Trans Image Process 27(2):822–836
Article MathSciNet Google Scholar
Zhang Y, Chen Y, Bai X, Zhou J, Yu K, Li Z, Yang K (2019) Adaptive unimodal cost volume filtering for deep stereo matching. arXiv preprint arXiv:1909.03751
Zhang F, Prisacariu V, Yang R, Torr PH (2019) Ga-net: Guided aggregation net for end-to-end stereo matching. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 185–194

Download references

Acknowledgements

The work was supported by Guangdong Basic and Applied Basic Research Foundation Grant No. 2019A1515011078, and Guangzhou Scientific and Technological Plan Project No. 201904010228.

Author information

Authors and Affiliations

School of Computer Science, South China Normal University, Guangzhou, China
Qingzhen Xu, Shuang Liu & Guangyi Huang
Intelligent Health and Visual Computing Lab, Guangdong University of Foreign Studies, Guangzhou, China
Shuang Liu, Guangyi Huang & Yongyi Gong
School of Data and Computer Science, Sun Yat-sen University, Guangzhou, China
Kun Zeng
School of Information Science and Technology, Guangdong University of Foreign Studies, Guangzhou, China
Yongyi Gong
School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin, China
Xiaonan Luo

Authors

Qingzhen Xu
View author publications
You can also search for this author in PubMed Google Scholar
Shuang Liu
View author publications
You can also search for this author in PubMed Google Scholar
Guangyi Huang
View author publications
You can also search for this author in PubMed Google Scholar
Kun Zeng
View author publications
You can also search for this author in PubMed Google Scholar
Yongyi Gong
View author publications
You can also search for this author in PubMed Google Scholar
Xiaonan Luo
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All authors contributed to the research, experiment and manuscript. Huang Guangyi and Gong Yongyi were responsible for the design of the algorithm and the preparation of the experiment. The experiment and related discussion were performed by Qingzhen Xu, Shuang Liu, Guangyi Huang, Kun Zeng, Yongyi Gong and Xiaonan Luo. Qingzhen Xu, Shuang Liu and Guangyi Huang wrote the manuscript. Kun Zeng, Yongyi Gong and Xiaonan Luo were responsible for the final optimization. All authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Corresponding authors

Correspondence to Yongyi Gong or Xiaonan Luo.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Xu, Q., Liu, S., Huang, G. et al. CVE-Net: cost volume enhanced network guided by sparse features for stereo matching. Soft Comput 25, 15183–15199 (2021). https://doi.org/10.1007/s00500-021-06257-4

Download citation

Accepted: 08 September 2021
Published: 30 September 2021
Issue Date: December 2021
DOI: https://doi.org/10.1007/s00500-021-06257-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

CVE-Net: cost volume enhanced network guided by sparse features for stereo matching

Abstract

Access this article

Similar content being viewed by others

MC-DCNN: Dilated Convolutional Neural Network for Computing Stereo Matching Cost

Hierarchical Correlation Stereo Matching Network

SRC-Disp: Synthetic-Realistic Collaborative Disparity Learning for Stereo Matching

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

CVE-Net: cost volume enhanced network guided by sparse features for stereo matching

Abstract

Access this article

Similar content being viewed by others

MC-DCNN: Dilated Convolutional Neural Network for Computing Stereo Matching Cost

Hierarchical Correlation Stereo Matching Network

SRC-Disp: Synthetic-Realistic Collaborative Disparity Learning for Stereo Matching

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation