Skip to main content

Realtime Human Segmentation in Video

  • Conference paper
  • First Online:
MultiMedia Modeling (MMM 2019)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11296))

Included in the following conference series:

Abstract

Human segmentation from a single image using deep learning models has obtained significant performance improvements. However, when directly adopting a deep human segmentation model on video human segmentation, the performance is unsatisfactory due to some issues, e.g., the segmentation results of video frames are discontinuous, and the speed of segmentation process is slow. To address these issues, we propose a new real-time video-based human segmentation framework which is designed for the single person from videos to produces smoothing, accurate and fast human segmentation results. The proposed framework for video human segmentation consists of a fully convolutional network and a tracking module based on a level set algorithm, where the fully convolutional network segments the human part in the first frame of the video sequence, and the tracking module obtains the segmentation results of other frames using the segmentation result of the last frame as the initial segmentation. The fully convolutional network is trained using human images datasets. To evaluate the proposed framework for video human segmentation, we have created and annotated a new single person video dataset. The experimental results demonstrate very accurate and smoothing human segmentation with very higher speed only using a deep human segmentation model.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bi, S., Liang, D.: Human segmentation in a complex situation based on properties of the human visual system. In: 2006 6th World Congress on Intelligent Control and Automation, vol. 2, pp. 9587–9590 (2006)

    Google Scholar 

  2. Chopp, D.L.: Computing minimal surfaces via level set curvature flow. J. Comput. Phys. 106, 77–91 (1993)

    Article  MathSciNet  Google Scholar 

  3. Dai, J., He, K., Sun, J.: Boxsup: Exploiting bounding boxes to supervise convolutional networks for semantic segmentation. In: 2015 IEEE International Conference on Computer Vision (ICCV). pp. 1635–1643, December 2015

    Google Scholar 

  4. Gu, D., Zhao, Y., Yuan, Y., Hu, G.: Human segmentation based on disparity map and grabcut. In: 2012 International Conference on Computer Vision in Remote Sensing, pp. 67–71, December 2012

    Google Scholar 

  5. Heo, S., Koo, H.I., Kim, H.I., Cho, N.I.: Human segmentation algorithm for real-time video-call applications. In: 2013 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, pp. 1–4, October 2013

    Google Scholar 

  6. Hernandez-Vela, A., et al.: Graph cuts optimization for multi-limb human segmentation in depth maps. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 726–732, June 2012

    Google Scholar 

  7. Jia, Y., et al.: Caffe: Convolutional architecture for fast feature embedding. In: Proceedings of the 22nd ACM International Conference on Multimedia. MM 2014, pp. 675–678. ACM, New York (2014)

    Google Scholar 

  8. Junior, J.C.S.J., Jung, C.R., Musse, S.R.: Skeleton-based human segmentation in still images. In: 2012 19th IEEE International Conference on Image Processing, pp. 141–144, September 2012

    Google Scholar 

  9. Kazemi, V., Sullivan, J.: One millisecond face alignment with an ensemble of regression trees. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1867–1874, June 2014

    Google Scholar 

  10. Kim, K., Oh, C., Sohn, K.: Non-parametric human segmentation using support vector machine. In: 2016 IEEE International Conference on Consumer Electronics (ICCE), pp. 131–132, January 2016

    Google Scholar 

  11. Kim, Y.S., Yoon, J.C., Lee, I.K.: Real-time human segmentation from RGB-d video sequence based on adaptive geodesic distance computation. In: Multimedia Tools and Applications, November 2017

    Google Scholar 

  12. King, D.E.: Dlib-ml: a machine learning toolkit. J. Mach. Learn. Res. 10, 1755–1758 (2009)

    Google Scholar 

  13. Kohli, P., Rihan, J., Bray, M., Torr, P.H.: Simultaneous segmentation and pose estimation of humans using dynamic graph cuts. Int. J. Comput. Vision 79(3), 285–298 (2008)

    Article  Google Scholar 

  14. Kumar, R., Kumar, R., Gopalakrishnan, V., Iyer, K.N.: Fast human segmentation using color and depth. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1922–1926, March 2017

    Google Scholar 

  15. Lee, Y.T., Su, T.F., Su, H.R., Lai, S.H., Lee, T.C., Shih, M.Y.: Human segmentation from video by combining random walks with human shape prior adaption. In: 2013 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, pp. 1–4, October 2013

    Google Scholar 

  16. Li, C., Xu, C., Gui, C., Fox, M.D.: Level set evolution without re-initialization: a new variational formulation. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2005) (CVPR), vol. 01, pp. 430–436, June 2005

    Google Scholar 

  17. Li, J., et al.: Multiple-Human Parsing in the Wild. ArXiv e-prints, May 2017

    Google Scholar 

  18. Liang, X., et al.: Human parsing with contextualized convolutional neural network. In: 2015 IEEE International Conference on Computer Vision (ICCV), pp. 1386–1394, December 2015

    Google Scholar 

  19. Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3431–3440, June 2015

    Google Scholar 

  20. Mostajabi, M., Yadollahpour, P., Shakhnarovich, G.: Feedforward semantic segmentation with zoom-out features. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), vol. 00, pp. 3376–3385, June 2015

    Google Scholar 

  21. Noh, H., Hong, S., Han, B.: Learning deconvolution network for semantic segmentation. In: 2015 IEEE International Conference on Computer Vision (ICCV), vol. 00, pp. 1520–1528, December 2015

    Google Scholar 

  22. Osher, S., Sethian, J.A.: Fronts propagating with curvature-dependent speed: algorithms based on hamilton-jacobi formulations. J. Comput. Phys. 79, 12–49 (1988)

    Article  MathSciNet  Google Scholar 

  23. Park, S., Yoo, J.H.: Human segmentation based on grabcut in real-time video sequences. In: 2014 IEEE International Conference on Consumer Electronics (ICCE), pp. 111–112, January 2014

    Google Scholar 

  24. Ramadan, H., Tairi, H.: Automatic human segmentation in video using convex active contours. In: 2016 13th International Conference on Computer Graphics, Imaging and Visualization (CGiV), pp. 184–189, March 2016

    Google Scholar 

  25. Shen, X., et al.: Automatic portrait segmentation for image stylization. In: Proceedings of the 37th Annual Conference of the European Association for Computer Graphics (2016)

    Google Scholar 

  26. Shi, Y., Karl, W.C.: Real-time tracking using level sets. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2005), vol. 2, pp. 34–41, June 2005

    Google Scholar 

  27. Song, C., Huang, Y., Wang, Z., Wang, L.: 1000fps human segmentation with deep convolutional neural networks. In: Pattern Recognition, pp. 474–478 (2016)

    Google Scholar 

  28. Tan, Y., Guo, Y., Gao, C.: Background subtraction based level sets for human segmentation in thermal infrared surveillance systems. Infrared Phys. Technol. 61(5), 230–240 (2013)

    Article  Google Scholar 

  29. Wu, X., Du, M., Chen, W., Li, Z.: Exploiting deep convolutional network and patch-level CRFs for indoor semantic segmentation. In: 2016 IEEE 11th Conference on Industrial Electronics and Applications (ICIEA), pp. 150–155, June 2016

    Google Scholar 

  30. Wu, Z., Huang, Y., Yu, Y., Wang, L., Tan, T.: Early Hierarchical Contexts Learned by Convolutional Networks for Image Segmentation. In: Proceedings of the 22nd International Conference on Pattern Recognition, pp. 1538–1543. IEEE (2014)

    Google Scholar 

  31. Zhao, T., Nevatia, R.: Stochastic human segmentation from a static camera. In: Proceedings of the Workshop on Motion and Video Computing, pp. 9–14, December 2002

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tairan Zhang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhang, T., Lang, C., Xing, J. (2019). Realtime Human Segmentation in Video. In: Kompatsiaris, I., Huet, B., Mezaris, V., Gurrin, C., Cheng, WH., Vrochidis, S. (eds) MultiMedia Modeling. MMM 2019. Lecture Notes in Computer Science(), vol 11296. Springer, Cham. https://doi.org/10.1007/978-3-030-05716-9_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-05716-9_17

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-05715-2

  • Online ISBN: 978-3-030-05716-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics