Abstract
Deep learning based on convolutional neural network (CNN) has been successfully applied to stereo matching as it can accelerate the training process and improve the matching accuracy. However, the existing stereo matching framework based on CNN often has two problems. The first problem is the generalization ability of training model. Stereo matching frameworks are usually pre-trained on a large synthetic Scene Flow dataset and then fine-tuned on evaluation dataset. However, the evaluation dataset may contain trivial training data or even do not have disparity label for some specified tasks. This adversely affects the generality of the training model. The second problem is the poor matching performance in ill-posed regions. It is difficult to distinguish the ill-posed regions, including weak texture area, repeated texture area, occlusion area, reflection structure, and fine structure, etc. To ameliorate the aforementioned problems, we propose the cost volume enhancement network (CVE-Net) guided by sparse features for stereo matching. CVE-Net use the edge information and saliency information for sparsely sampling the precise disparity labels during training. Furthermore, we enhance the cost volume by leveraging the precise disparity sparse label information to guide the direction of training. The experiment shows that the generalization ability is significantly improved. The domain-transferring problem on the new dataset is significantly alleviated. In addition, introducing the sparse multiple semantic features improves the matching performance in the ill-posed regions. Even without fine-tuning, the matching requirements can be met. These results demonstrate the effectiveness of the CVE-Net.
Similar content being viewed by others
References
Canny J (1986) A computational approach to edge detection. IEEE Trans Pattern Anal Mach Intell 6:679–698
Chang J-R, Chen Y-S (2018) Pyramid stereo matching network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5410–5418
Gadekallu TR, Alazab M, Kaluri R, Maddikunta P, Parimala M (2021) Hand gesture classification using a novel CNN-crow search algorithm. Complex & Intelligent Systems, no. 6
Geiger A, Lenz P, Urtasun R (2012) Are we ready for autonomous driving? the kitti vision benchmark suite. In: 2012 IEEE conference on computer vision and pattern recognition. IEEE, pp 3354–3361
Guo X, Yang K, Yang W, Wang X, Li H (2019) Group-wise correlation stereo network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3273–3282
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
He J, Zhang S, Yang M, Shan Y, Huang T (2019) Bi-directional cascade network for perceptual edge detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3828–3837
Hirschmuller H (2007) Stereo processing by semiglobal matching and mutual information. IEEE Trans Pattern Anal Mach Intell 30(2):328–341
Huang G, Gong Y, Xu Q, Wattanachote K, Zeng K, Luo X (2020) A convolutional attention residual network for stereo matching. IEEE Access 8:50828–50842
Jie Z, Wang P, Ling Y, Zhao B, Wei Y, Feng J, Liu W (2018) Left-right comparative recurrent model for stereo matching. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3838–3846
Kendall A, Martirosyan H, Dasgupta S, Henry P, Kennedy R, Bachrach A, Bry A (2017) End-to-end learning of geometry and context for deep stereo regression. In: Proceedings of the IEEE international conference on computer vision, pp 66–75
Kingma DP, Ba J (2014) Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980
Knobelreiter P, Reinbacher C, Shekhovtsov A, Pock T (2017) End-to-end training of hybrid CNN-CRF models for stereo. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2339–2348
Liang Z, Feng Y, Guo Y, Liu H, Chen W, Qiao L, Zhou L, Zhang J (2018) Learning for disparity estimation through feature constancy. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2811–2820
Liu A, Nie W, Gao Y, Su Y (2018) View-based 3-d model retrieval: a benchmark. IEEE Trans Cybern 48(3):916–928
Lu C, Uchiyama H, Thomas D, Shimada A, Taniguchi R-I (2018) Sparse cost volume for efficient stereo matching. Remote Sens 10(11):1844
Mayer N, Ilg E, Hausser P, Fischer P, Cremers D, Dosovitskiy A, Brox T (2016) A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4040–4048
Menze M, Geiger A (2015) Object scene flow for autonomous vehicles. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3061–3070
Otsu N (1979) A threshold selection method from gray-level histograms. IEEE Trans Syst Man Cybern 9(1):62–66
Pang J, Sun W, Ren JS, Yang C, Yan Q (2017) Cascade residual learning: a two-stage convolutional neural network for stereo matching. In: Proceedings of the IEEE international conference on computer vision workshops, pp 887–895
Ren Y, Xie X, Li G, Wang Z (2018) A scan-line forest growing-based hand segmentation framework with multipriority vertex stereo matching for wearable devices. IEEE Trans Cybern 48(2):556–570
Scharstein D, Szeliski R (2002) A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. Int J Comput Vis 47(1–3):7–42
Seki A, Pollefeys M (2016) Patch based confidence prediction for dense disparity map. BMVC 2(3):4
Seki A, Pollefeys M (2017) Sgm-nets: Semi-global matching with neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 231–240
Smolyanskiy N, Kamenev A, Birchfield S (2018) On the importance of stereo for accurate depth estimation: an efficient semi-supervised deep neural network approach. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 1007–1015
Song X, Zhao X, Hu H, Fang L (2018) Edgestereo: a context integrated residual pyramid network for stereo matching. Asian conference on computer vision. Springer, Berlin, pp 20–35
Srivastava G, Reddy PK, Gadekallu TR, Siva SG, Ashokkumar P (2020) A two stage text feature selection algorithm for improving text classification
Tulyakov S, Ivanov A, Fleuret F (2018) Practical deep stereo (PDS): toward applications-friendly deep stereo matching. In: Advances in neural information processing systems, pp 5871–5881
Wang C, Bai X, Wang X, Liu X, Zhou J, Wu X, Li H, Tao D (2020) Self-supervised multiscale adversarial regression network for stereo disparity estimation. In: IEEE Transactions on Cybernetics, pp 1–14
Woo S, Park J, Lee J-Y, So Kweon I (2018) Cbam: Convolutional block attention module. In: Proceedings of the European conference on computer vision (ECCV), pp 3–19
Wu Z, Su L, Huang Q (2019a) Cascaded partial decoder for fast and accurate salient object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3907–3916
Wu Z, Wu X, Zhang X, Wang S, Ju L (2019) Semantic stereo matching with pyramid cost volumes. In: Proceedings of the IEEE International conference on computer vision, pp 7484–7493
Xie S, Tu Z (2015) Holistically-nested edge detection. In: Proceedings of the IEEE international conference on computer vision, pp 1395–1403
Xie S, Girshick R, Dollár P, Tu Z, He K (2017) Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1492–1500
Yang G, Zhao H, Shi J, Deng Z, Jia J (2018) Segstereo: Exploiting semantic information for disparity estimation. In: Proceedings of the European conference on computer vision (ECCV), pp 636–651
Yang G, Manela J, Happold M, Ramanan D (2019) Hierarchical deep stereo matching on high-resolution images. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5515–5524
Žbontar J, LeCun Y (2016) Stereo matching by training a convolutional neural network to compare image patches. J Mach Learn Res 17(1):2287–2318
Zhang F, Wah BW (2017) Fundamental principles on learning new features for effective dense matching. IEEE Trans Image Process 27(2):822–836
Zhang Y, Chen Y, Bai X, Zhou J, Yu K, Li Z, Yang K (2019) Adaptive unimodal cost volume filtering for deep stereo matching. arXiv preprint arXiv:1909.03751
Zhang F, Prisacariu V, Yang R, Torr PH (2019) Ga-net: Guided aggregation net for end-to-end stereo matching. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 185–194
Acknowledgements
The work was supported by Guangdong Basic and Applied Basic Research Foundation Grant No. 2019A1515011078, and Guangzhou Scientific and Technological Plan Project No. 201904010228.
Author information
Authors and Affiliations
Contributions
All authors contributed to the research, experiment and manuscript. Huang Guangyi and Gong Yongyi were responsible for the design of the algorithm and the preparation of the experiment. The experiment and related discussion were performed by Qingzhen Xu, Shuang Liu, Guangyi Huang, Kun Zeng, Yongyi Gong and Xiaonan Luo. Qingzhen Xu, Shuang Liu and Guangyi Huang wrote the manuscript. Kun Zeng, Yongyi Gong and Xiaonan Luo were responsible for the final optimization. All authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.
Corresponding authors
Ethics declarations
Conflict of interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Xu, Q., Liu, S., Huang, G. et al. CVE-Net: cost volume enhanced network guided by sparse features for stereo matching. Soft Comput 25, 15183–15199 (2021). https://doi.org/10.1007/s00500-021-06257-4
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-021-06257-4