Semantic Scene Mapping with Spatio-temporal Deep Neural Network for Robotic Applications

Li, Ruihao; Gu, Dongbing; Liu, Qiang; Long, Zhiqiang; Hu, Huosheng

doi:10.1007/s12559-017-9526-9

Semantic Scene Mapping with Spatio-temporal Deep Neural Network for Robotic Applications

Published: 24 November 2017

Volume 10, pages 260–271, (2018)
Cite this article

Cognitive Computation Aims and scope Submit manuscript

Ruihao Li ORCID: orcid.org/0000-0002-9839-1489¹,
Dongbing Gu¹,
Qiang Liu¹,
Zhiqiang Long² &
…
Huosheng Hu¹

987 Accesses
17 Citations
1 Altmetric
Explore all metrics

Abstract

Semantic scene mapping is a challenge and significant task for robotic application, such as autonomous navigation and robot-environment interaction. In this paper, we propose a semantic pixel-wise mapping system for potential robotic applications. The system includes a novel spatio-temporal deep neural network for semantic segmentation and a Simultaneous Localisation and Mapping (SLAM) algorithm for 3D point cloud map. Their combination yields a 3D semantic pixel-wise map. The proposed network consists of Convolutional Neural Networks (CNNs) with two streams: spatial stream with images as the input and temporal stream with image differences as the input. Due to the use of both spatial and temporal information, it is called spatio-temporal deep neural network, which shows a better performance in both accuracy and robustness in semantic segmentation. Further, only keyframes are selected for semantic segmentation in order to reduce the computational burden for video streams and improve the real-time performance. Based on the result of semantic segmentation, a 3D semantic map is built up by using the 3D point cloud map from a SLAM algorithm. The proposed spatio-temporal neural network is evaluated on both Cityscapes benchmark (a public dataset) and Essex Indoor benchmark (a dataset we labelled ourselves manually). Compared with the state-of-the-art spatial only neural networks, the proposed network achieves better performances in both pixel-wise accuracy and Intersection over Union (IoU) for scene segmentation. The constructed 3D semantic map with our methods is accurate and meaningful for robotic applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Algorithm Design of Semantic Map Construction for Dynamic Scene

Large-scale 3D Semantic Mapping Using Stereo Vision

Article 09 March 2018

Fine semantic mapping based on dense segmentation network

Article 16 November 2020

References

Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A. Going deeper with convolutions. Proceedings of the IEEE conference on computer vision and pattern recognition; 2015. p. 1–9.
He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. Proceedings of the IEEE conference on computer vision and pattern recognition; 2016. p. 770–8.
Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation. Proceedings of the IEEE conference on computer vision and pattern recognition; 2015. p. 3431–40.
Zhao H, Shi J, Qi X, Wang X, Jia J. 2016. Pyramid scene parsing network. arXiv:1612.01105.
Durrant-Whyte H, Bailey T. Simultaneous localization and mapping: part I. IEEE Robot Autom Mag 2006; 13(2):99–110.
Article Google Scholar
Bailey T, Durrant-Whyte H. Simultaneous localization and mapping: part II. IEEE Robot Autom Mag 2006; 13(3):108–17.
Article Google Scholar
Xie J, Yu L, Zhu L, Chen X. Semantic image segmentation method with multiple adjacency trees and multiscale features. Cogn Comput 2017;9(2):168–79.
Article Google Scholar
Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. Proceedings of the 3rd international conference on learning representations; 2015. p. 1–14.
Liu W, Rabinovich A, Berg AC. 2015. Parsenet: looking wider to see better. arXiv:1506.04579.
Badrinarayanan V, Kendall A, Cipolla R. 2015. Segnet: a deep convolutional encoder-decoder architecture for image segmentation. arXiv:1511.00561.
Kendall A, Badrinarayanan V, Cipolla R. 2015. Bayesian segnet: model uncertainty in deep convolutional encoder-decoder architectures for scene understanding. arXiv:1511.02680.
Zheng S, Jayasumana S, Romera-Paredes B, Vineet V, Su Z, Du D, Huang C, Torr PH. Conditional random fields as recurrent neural networks. Proceedings of the IEEE international conference on computer vision; 2015. p. 1529–37.
Arnab A, Jayasumana S, Zheng S, Torr PH. Higher order conditional random fields in deep neural networks. European conference on computer vision. Springer; 2016. p. 524–40.
Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L. Imagenet: a large-scale hierarchical image database. IEEE conference on computer vision and pattern recognition, 2009. CVPR 2009. IEEE; 2009. p. 248–55.
Chen L-C, Papandreou G, Kokkinos I, Murphy K, Yuille AL. 2014. Semantic image segmentation with deep convolutional nets and fully connected CRFs. arXiv:1412.7062.
Chen L-C, Papandreou G, Kokkinos I, Murphy K, Yuille AL. 2016. Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. arXiv:1606.00915.
Chen L-C, Yang Y, Wang J, Xu W, Yuille AL. Attention to scale: scale-aware semantic image segmentation. Proceedings of the IEEE conference on computer vision and pattern recognition; 2016. p. 3640–9.
Everingham M, Eslami SA, Van Gool L, Williams CK, Winn J, Zisserman A. The pascal visual object classes challenge: a retrospective. Int J Comput Vis 2015;111(1):98–136.
Article Google Scholar
Wu Z, Shen C, Hengel AVD. 2016. High-performance semantic segmentation using very deep fully convolutional networks. arXiv:1604.04339.
Cordts M, Omran M, Ramos S, Rehfeld T, Enzweiler M, Benenson R, Franke U, Roth S, Schiele B. The cityscapes dataset for semantic urban scene understanding. Proceedings of the IEEE conference on computer vision and pattern recognition; 2016. p. 3213–23.
Wu Z, Shen C, Hengel AVD. 2016. Wider or deeper: revisiting the resnet model for visual recognition. arXiv:1611.10080.
Zhou B, Zhao H, Puig X, Fidler S, Barriuso A, Torralba A. 2016. Semantic understanding of scenes through the ade20k dataset. arXiv:1608.05442.
Tu Z, Abel A, Zhang L, Luo B, Hussain A. A new spatio-temporal saliency-based video object segmentation. Cogn Comput 2016;8(4):629–647.
Article Google Scholar
Doborjeh ZG, Doborjeh MG, Kasabov N. Attentional bias pattern recognition in spiking neural networks from spatio-temporal EEG data. Cogn Comput, 2017:1–14.
Wang S, Clark R, Wen H, Trigoni N. DeepVO: towards end-to-end visual odometry with deep recurrent convolutional neural networks. 2017 IEEE international conference on robotics and automation (ICRA). IEEE; 2017. p. 2043–50.
Wang L, Xiong Y, Wang Z, Qiao Y. 2015. Towards good practices for very deep two-stream convnets. arXiv:1507.02159.
Wang L, Xiong Y, Wang Z, Qiao Y, Lin D, Tang X, Van Gool L. Temporal segment networks: towards good practices for deep action recognition. European conference on computer vision. Springer; 2016. p. 20–36.
Li R, Liu Q, Gui J, Gu D, Hu H. 2017. Indoor relocalization in challenging environments with dual-stream convolutional neural networks. IEEE Trans Autom Sci Eng.
Eitel A, Springenberg JT, Spinello L, Riedmiller M, Burgard W. Multimodal deep learning for robust RGB-d object recognition. 2015 IEEE/RSJ international conference on intelligent robots and systems (IROS). IEEE; 2015. p. 681–7.
Schwarz M, Schulz H, Behnke S. RGB-D object recognition and pose estimation based on pre-trained convolutional neural network features. 2015 IEEE international conference on robotics and automation (ICRA). IEEE; 2015. p. 1329–35.
Hazirbas C, Ma L, Domokos C, Cremers D. Fusenet: incorporating depth into semantic segmentation via fusion-based CNN architecture. Proceedings of ACCV; 2016.
Valada A, Oliveira G, Brox T, Burgard W. Towards robust semantic segmentation using deep fusion. Robotics: science and systems (RSS 2016) workshop, are the sceptics right? Limits and potentials of deep learning in robotics; 2016.
Valada A, Vertens J, Dhall A, Burgard W. Adapnet: adaptive semantic segmentation in adverse environmental conditions. 2017 IEEE international conference on robotics and automation (ICRA). IEEE; 2017.
Hülse M, McBride S, Lee M. Fast learning mapping schemes for robotic hand–eye coordination. Cogn Comput 2010;2(1):1–16.
Article Google Scholar
Salas-Moreno RF, Glocken B, Kelly PH, Davison AJ. Dense planar slam. 2014 IEEE international symposium on mixed and augmented reality (ISMAR). IEEE; 2014. p. 157–64.
Salas-Moreno RF, Newcombe RA, Strasdat H, Kelly PH, Davison AJ. Slam++: simultaneous localisation and mapping at the level of objects. Proceedings of the IEEE conference on computer vision and pattern recognition; 2013. p. 1352–9.
Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T. Caffe: convolutional architecture for fast feature embedding. Proceedings of the ACM international conference on multimedia. ACM; 2014. p. 675–8.
Mur-Artal R, Tardós JD. Fast relocalisation and loop closing in keyframe-based SLAM. 2014 IEEE international conference on robotics and automation (ICRA). IEEE; 2014. p. 846–53.
Mur-Artal R, Montiel J, Tardos JD. ORB-SLAM: a versatile and accurate monocular SLAM system. IEEE Trans Robot 2015;31(5):1147–63.
Article Google Scholar

Download references

Acknowledgments

The authors would like to thank Robin Dowling for his support in experiments.

Funding

The first author has been financially supported by scholarship from China Scholarship Council.

Author information

Authors and Affiliations

Department of Computer Science and Electronic Engineering, University of Essex, Colchester, CO4 3SQ, UK
Ruihao Li, Dongbing Gu, Qiang Liu & Huosheng Hu
College of Mechatronics and Automation, National University of Defense Technology, Changsha, China
Zhiqiang Long

Authors

Ruihao Li
View author publications
You can also search for this author in PubMed Google Scholar
Dongbing Gu
View author publications
You can also search for this author in PubMed Google Scholar
Qiang Liu
View author publications
You can also search for this author in PubMed Google Scholar
Zhiqiang Long
View author publications
You can also search for this author in PubMed Google Scholar
Huosheng Hu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ruihao Li.

Ethics declarations

Conflict of Interest

The authors declare that they have no conflict of interest.

Informed Consent

Informed consent was obtained from all individual participants included in the study.

Human and Animal Rights

This article does not contain any studies with human or animal subjects performed by the any of the authors.

Additional information

Informed Consent

Informed consent was obtained from all individual participants included in the study.

Electronic supplementary material

Below is the link to the electronic supplementary material.

(MP4 14.5 MB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, R., Gu, D., Liu, Q. et al. Semantic Scene Mapping with Spatio-temporal Deep Neural Network for Robotic Applications. Cogn Comput 10, 260–271 (2018). https://doi.org/10.1007/s12559-017-9526-9

Download citation

Received: 25 September 2017
Accepted: 31 October 2017
Published: 24 November 2017
Issue Date: April 2018
DOI: https://doi.org/10.1007/s12559-017-9526-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Semantic Scene Mapping with Spatio-temporal Deep Neural Network for Robotic Applications

Abstract

Access this article

Similar content being viewed by others

Algorithm Design of Semantic Map Construction for Dynamic Scene

Large-scale 3D Semantic Mapping Using Stereo Vision

Fine semantic mapping based on dense segmentation network

References

Acknowledgments

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interest

Informed Consent

Human and Animal Rights

Additional information

Informed Consent

Electronic supplementary material

(MP4 14.5 MB)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Semantic Scene Mapping with Spatio-temporal Deep Neural Network for Robotic Applications

Abstract

Access this article

Similar content being viewed by others

Algorithm Design of Semantic Map Construction for Dynamic Scene

Large-scale 3D Semantic Mapping Using Stereo Vision

Fine semantic mapping based on dense segmentation network

References

Acknowledgments

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interest

Informed Consent

Human and Animal Rights

Additional information

Informed Consent

Electronic supplementary material

(MP4 14.5 MB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation