Skip to main content

Adaptive Visual-Depth Fusion Transfer

  • Conference paper
  • First Online:
Computer Vision – ACCV 2018 (ACCV 2018)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11364))

Included in the following conference series:

Abstract

While RGB-D classification task has been actively researched in recent years, most existing methods focus on the RGB-D source to target transfer task. The application of such methods cannot address the real-world scenario where the paired depth images are not hold. This paper focuses on a more flexible task that recognizes RGB test images by transferring them into the depth domain. Such a scenario retains high performance due to gaining auxiliary information but reduces the cost of pairing RGB with depth sensors at test time. Existing methods suffer from two challenges: the utilization of the additional depth features, and the domain shifting problem due to the different mechanisms between conventional RGB cameras and depth sensors. As a step towards bridging the gap, we propose a novel method called adaptive Visual-Depth Fusion Transfer (aVDFT) which can take advantage of the depth information and handle the domain distribution mismatch simultaneously. Our key novelties are: (1) a global visual-depth metric construction algorithm that can effectively align RGB and depth data structure; (2) adaptive transformed component extraction for target domain that conditioned on invariant transfer on location, scale and depth measurement. To demonstrate the effectiveness of aVDFT, we conduct comprehensive experiments on six pairs of RGB-D datasets for object recognition, scene classification and gender recognition and demonstrate state-of-the-art performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Baktashmotlagh, M., Harandi, M.T., Lovell, B.C., Salzmann, M.: Unsupervised domain adaptation by domain invariant projection. In: IEEE International Conference on Computer Vision, pp. 769–776 (2013)

    Google Scholar 

  2. Bo, L., Ren, X., Fox, D.: Depth kernel descriptors for object recognition. In: IEEE International Conference on Intelligent Robots and Systems, pp. 821–826 (2011)

    Google Scholar 

  3. Cai, Z., Han, J., Liu, L., Shao, L.: RGB-D datasets using microsoft kinect or similar sensors: a survey. Multimed. Tools Appl. 76(3), 4313–4355 (2017)

    Article  Google Scholar 

  4. Cai, Z., Long, Y., Shao, L.: Adaptive RGB image recognition by visual-depth embedding. IEEE Trans. Image Process. 27(5), 2471–2483 (2018)

    Article  MathSciNet  Google Scholar 

  5. Cai, Z., Shao, L.: RGB-D scene classification via multi-modal feature learning. Cogn. Comput. 10, 1–16 (2018)

    Article  Google Scholar 

  6. Chen, L., Li, W., Xu, D.: Recognizing RGB images by learning from RGB-D data. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1418–1425 (2014)

    Google Scholar 

  7. Cui, Z., Li, W., Xu, D., Shan, S., Chen, X., Li, X.: Flowing on Riemannian manifold: domain adaptation by shifting covariance. IEEE Trans. Cybern. 44(12), 2264–2273 (2014)

    Article  Google Scholar 

  8. Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255 (2009)

    Google Scholar 

  9. Duan, L., Tsang, I.W., Xu, D.: Domain transfer multiple kernel learning. IEEE Trans. Pattern Anal. Mach. Intell. 34(3), 465–479 (2012)

    Article  Google Scholar 

  10. Edelman, A., Arias, T.A., Smith, S.T.: The geometry of algorithms with orthogonality constraints. SIAM J. Matrix Anal. Appl. 20(2), 303–353 (1998)

    Article  MathSciNet  Google Scholar 

  11. Farquhar, J., Hardoon, D., Meng, H., Shawe-taylor, J.S., Szedmak, S.: Two view learning: SVM-2K, theory and practice. In: Advances in Neural Information Processing Systems, pp. 355–362 (2005)

    Google Scholar 

  12. Fei-Fei, L., Perona, P.: A Bayesian hierarchical model for learning natural scene categories. In: IEEE Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 524–531 (2005)

    Google Scholar 

  13. Fernando, B., Habrard, A., Sebban, M., Tuytelaars, T.: Unsupervised visual domain adaptation using subspace alignment. In: IEEE International Conference on Computer Vision, pp. 2960–2967 (2013)

    Google Scholar 

  14. Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014)

    Google Scholar 

  15. Gong, B., Grauman, K., Sha, F.: Connecting the dots with landmarks: discriminatively learning domain-invariant features for unsupervised domain adaptation. In: International Conference on Machine Learning, pp. 222–230 (2013)

    Google Scholar 

  16. Gong, B., Shi, Y., Sha, F., Grauman, K.: Geodesic flow kernel for unsupervised domain adaptation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2066–2073 (2012)

    Google Scholar 

  17. Gopalan, R., Li, R., Chellappa, R.: Domain adaptation for object recognition: an unsupervised approach. In: International Conference on Computer Vision, pp. 999–1006 (2011)

    Google Scholar 

  18. Griffin, G., Holub, A., Perona, P.: Caltech-256 object category dataset (2007)

    Google Scholar 

  19. Gupta, S., Arbeláez, P., Girshick, R., Malik, J.: Indoor scene understanding with RGB-D images: bottom-up segmentation, object detection and semantic segmentation. Int. J. Comput. Vis. 112(2), 133–149 (2015)

    Article  MathSciNet  Google Scholar 

  20. Gupta, S., Girshick, R., Arbeláez, P., Malik, J.: Learning rich features from RGB-D images for object detection and segmentation. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8695, pp. 345–360. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10584-0_23

    Chapter  Google Scholar 

  21. Hardoon, D.R., Szedmak, S., Shawe-Taylor, J.: Canonical correlation analysis: an overview with application to learning methods. Neural Comput. 16(12), 2639–2664 (2004)

    Article  Google Scholar 

  22. Huang, J., Gretton, A., Borgwardt, K.M., Schölkopf, B., Smola, A.J.: Correcting sample selection bias by unlabeled data. In: Advances in Neural Information Processing Systems, pp. 601–608 (2006)

    Google Scholar 

  23. Huynh, T., Min, R., Dugelay, J.-L.: An efficient LBP-based descriptor for facial depth images applied to gender recognition using RGB-D face data. In: Park, J.-I., Kim, J. (eds.) ACCV 2012. LNCS, vol. 7728, pp. 133–145. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-37410-4_12

    Chapter  Google Scholar 

  24. Janoch, A., et al.: A category-level 3D object dataset: putting the kinect to work. In: Fossati, A., Gall, J., Grabner, H., Ren, X., Konolige, K. (eds.) Consumer Depth Cameras for Computer Vision. ACVPR, pp. 141–165. Springer, London (2013). https://doi.org/10.1007/978-1-4471-4640-7_8

    Chapter  Google Scholar 

  25. Jia, Y., et al.: Caffe: convolutional architecture for fast feature embedding. In: ACM International Conference on Multimedia, pp. 675–678 (2014)

    Google Scholar 

  26. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)

    Google Scholar 

  27. Lai, K., Bo, L., Ren, X., Fox, D.: A large-scale hierarchical multi-view RGB-D object dataset. In: IEEE International Conference on Robotics and Automation, pp. 1817–1824 (2011)

    Google Scholar 

  28. Li, W., Chen, L., Xu, D., Van Gool, L.: Visual recognition in RGB images and videos by learning from RGB-D data. IEEE Trans. Pattern Anal. Mach. Intell. 1, 1 (2017)

    Google Scholar 

  29. Long, M., Wang, J., Ding, G., Sun, J., Yu, P.S.: Transfer joint matching for unsupervised domain adaptation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1410–1417 (2014)

    Google Scholar 

  30. Min, R., Kose, N., Dugelay, J.L.: KinectFaceDB: a kinect database for face recognition. IEEE Trans. Syst. Man Cybern.: Syst. 44(11), 1534–1548 (2014)

    Article  Google Scholar 

  31. Motiian, S., Doretto, G.: Information bottleneck domain adaptation with privileged information for visual recognition. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9911, pp. 630–647. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46478-7_39

    Chapter  Google Scholar 

  32. Pan, S.J., Tsang, I.W., Kwok, J.T., Yang, Q.: Domain adaptation via transfer component analysis. IEEE Trans. Neural Netw. 22(2), 199–210 (2011)

    Article  Google Scholar 

  33. Redko, I., Bennani, Y.: Non-negative embedding for fully unsupervised domain adaptation. Pattern Recogn. Lett. 77, 35–41 (2016)

    Article  Google Scholar 

  34. Shao, L., Cai, Z., Liu, L., Lu, K.: Performance evaluation of deep feature learning for RGB-D image/video classification. Inf. Sci. 385, 266–283 (2017)

    Article  Google Scholar 

  35. Sharmanska, V., Quadrianto, N., Lampert, C.H.: Learning to rank using privileged information. In: IEEE International Conference on Computer Vision, pp. 825–832 (2013)

    Google Scholar 

  36. Silberman, N., Fergus, R.: Indoor scene segmentation using a structured light sensor. In: IEEE International Conference on Computer Vision Workshops, pp. 601–608 (2011)

    Google Scholar 

  37. Vapnik, V., Vashist, A.: A new learning paradigm: learning using privileged information. Neural Netw. 22(5), 544–557 (2009)

    Article  Google Scholar 

  38. Wolf, L., Hassner, T., Taigman, Y.: Effective unconstrained face recognition by combining multiple descriptors and learned background statistics. IEEE Trans. Pattern Anal. Mach. Intell. 33(10), 1978–1990 (2011)

    Article  Google Scholar 

  39. Zhou, B., Lapedriza, A., Xiao, J., Torralba, A., Oliva, A.: Learning deep features for scene recognition using places database. In: Advances in Neural Information Processing Systems, pp. 487–495 (2014)

    Google Scholar 

Download references

Acknowledgements

This work was sponsored by NUPTSF (Grant No. NY218120), and MRC Innovation Fellowship with ref MR/S003916/1.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ziyun Cai .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Cai, Z., Long, Y., Jing, XY., Shao, L. (2019). Adaptive Visual-Depth Fusion Transfer. In: Jawahar, C., Li, H., Mori, G., Schindler, K. (eds) Computer Vision – ACCV 2018. ACCV 2018. Lecture Notes in Computer Science(), vol 11364. Springer, Cham. https://doi.org/10.1007/978-3-030-20870-7_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-20870-7_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-20869-1

  • Online ISBN: 978-3-030-20870-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics