Multi-modal RGB–Depth–Thermal Human Body Segmentation

Palmero, Cristina; Clapés, Albert; Bahnsen, Chris; Møgelmose, Andreas; Moeslund, Thomas B.; Escalera, Sergio

doi:10.1007/s11263-016-0901-x

Multi-modal RGB–Depth–Thermal Human Body Segmentation

Published: 13 April 2016

Volume 118, pages 217–239, (2016)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Cristina Palmero ORCID: orcid.org/0000-0002-6085-6527^1,2,
Albert Clapés^1,2,
Chris Bahnsen³,
Andreas Møgelmose³,
Thomas B. Moeslund³ &
…
Sergio Escalera^1,2

2635 Accesses
58 Citations
3 Altmetric
Explore all metrics

Abstract

This work addresses the problem of human body segmentation from multi-modal visual cues as a first stage of automatic human behavior analysis. We propose a novel RGB–depth–thermal dataset along with a multi-modal segmentation baseline. The several modalities are registered using a calibration device and a registration algorithm. Our baseline extracts regions of interest using background subtraction, defines a partitioning of the foreground regions into cells, computes a set of image features on those cells using different state-of-the-art feature extractions, and models the distribution of the descriptors per cell using probabilistic models. A supervised learning algorithm then fuses the output likelihoods over cells in a stacked feature vector representation. The baseline, using Gaussian mixture models for the probabilistic modeling and Random Forest for the stacked learning, is superior to other state-of-the-art methods, obtaining an overlap above 75 % on the novel dataset when compared to the manually annotated ground-truth of human segmentations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Learning to Segment Humans by Stacking Their Body Parts

Reconstructing Articulated Rigged Models from RGB-D Videos

KinectAvatar: Fully Automatic Body Capture Using a Single Kinect

Notes

This is an implementation of the work of Bradski and Kaehler (2008), which can be found at http://code.opencv.org.
http://pointclouds.org/documentation/tutorials/gpu_people.php.
https://github.com/PointCloudLibrary/data/tree/master/people/results.
Shotton et al. (2011) specified in the “Acknowledgements” section that the tracking system of Kinect SDK was built based on the research they presented in the paper.
Check the video included as supplementary material in which some qualitative results are shown, named trimodal_seg_results.mp4.

References

Abidi, B. (2007). IRIS thermal/visible face database. DOE University Research Program in Robotics under grant DOE-DE-FG02-86NE37968
Alahari, K., Seguin, G., Sivic, J., & Laptev, I. (2013). Pose estimation and segmentation of people in 3D movies. In IEEE international conference on computer vision (ICCV 2013).
Alpert, S., Galun, M., Basri, R., & Brandt, A. (2007). Image segmentation by probabilistic bottom-up aggregation and cue integration. In IEEE conference on computer vision and pattern recognition, 2007 (CVPR ’07) (pp. 1–8). doi:10.1109/CVPR.2007.383017.
Andriluka, M., Roth, S., & Schiele, B. (2009). Pictorial structures revisited: people detection and articulated pose estimation. In IEEE conference on computer vision and pattern recognition, 2009 (CVPR 2009) (pp. 1014–1021).
Andriluka, M., Roth, S., & Schiele, B. (2010). Monocular 3D pose estimation and tracking by detection. In IEEE conference on computer vision and pattern recognition, 2010 (CVPR 2010) (pp. 623–630).
Barbosa, I.B., Cristani, M., Del Bue, A., Bazzani, L., & Murino, V. (2012). Re-identification with RGB-D sensors. In Computer vision ECCV 2012. Workshops and demonstrations (pp. 433-442). Berlin: Springer.
Bertozzi, M., Broggi, A., Gomez, C.H., Fedriga, R.I., Vezzoni, G., & Del Rose, M. (2007). Pedestrian detection in far infrared images based on the use of probabilistic templates. In Intelligent vehicles symposium. 2007 IEEE (pp. 327–332). Piscataway: IEEE.
Bouguet, J. Y. (2004). Camera calibration toolbox for matlab.
Bourdev, L., & Malik, J. (2009). Poselets: body part detectors trained using 3D human pose annotations. In IEEE 12th international conference on computer vision, 2009 (pp. 1365–1372).
Bouwmans, T. (2011). Recent advanced statistical background modeling for foreground detection: A systematic survey. RPCS, 4(3), 147–176.
Article Google Scholar
Bouwmans, T., El Baf, F., Vachon, B., et al. (2008). Background modeling using mixture of gaussians for foreground detection: A survey. Recent Patents on Computer Science, 1(3), 219–237.
Article Google Scholar
Boykov, Y. Y., & Jolly, M. P. (2001). Interactive graph cuts for optimal boundary & region segmentation of objects in ND images. In Proceedings of eighth IEEE international conference on computer vision, 2001 (ICCV 2001) (Vol. 1, pp. 105–112).
Bradski, G., & Kaehler, A. (2008). Learning OpenCV: Computer vision with the OpenCV library. Sebastopo: O’reilly.
Google Scholar
Bray, M., Kohli, P., & Torr, P.H.S. (2006). Posecut: Simultaneous segmentation and 3D pose estimation of humans using dynamic graph-cuts. In Computer vision–ECCV 2006 (pp. 642–655). Berlin: Springer.
Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32.
Article MathSciNet MATH Google Scholar
Brkić, K., Rašić, S., Pinz, A., Šegvić, S., & Kalafatić, Z. (2013). Combining spatio-temporal appearance descriptors and optical flow for human action recognition in video data. arXiv:1310.0308.
Buys, K., Cagniart, C., Baksheev, A., De Laet, T., De Schutter, J., & Pantofaru, C. (2014). An adaptable system for RGB-D based human body detection and pose estimation. Journal of Visual Communication and Image Representation, 25(1), 39–52.
Article Google Scholar
Camplani, M., & Salgado, L. (2014). Background foreground segmentation with RGB-D Kinect data: An efficient combination of classifiers. Journal of Visual Communication and Image Representation, 25(1), 122–136.
Article Google Scholar
Carson, C., Belongie, S., Greenspan, H., & Malik, J. (2002). Blobworld: Image segmentation using expectation-maximization and its application to image querying. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(8), 1026–1038.
Article Google Scholar
Charles, J., Everingham, M. (2011). Learning shape models for monocular human pose estimation from the Microsoft Xbox Kinect. In 2011 IEEE international conference on computer vision workshops (ICCV Workshops) (pp. 1202–1208).
Chun, S.Y., Lee, C.S. (2013). Applications of human motion tracking: Smart lighting control. In 2013 IEEE conference on computer vision and pattern recognition workshops (CVPRW) (pp. 387–392).
Clapés, A., Reyes, M., & Escalera, S. (2012). User identification and object recognition in clutter scenes based on RGB-Depth analysis. In Articulated motion and deformable objects (pp. 1–11). Berlin: Springer.
Cohen, W. W. (2005). Stacked sequential learning. DTIC Document: Technical report.
Dai, C., Zheng, Y., & Li, X. (2007). Pedestrian detection and tracking in infrared imagery using shape and appearance. Computer Vision and Image Understanding, 106(2), 288–299.
Article Google Scholar
Dalal, N., Triggs, B. (2005). Histograms of oriented gradients for human detection. In IEEE computer society conference on computer vision and pattern recognition, 2005 (CVPR 2005) (Vol. 1, pp. 886–893).
Dalal, N., Triggs, B., Schmid, C. (2006). Human detection using oriented histograms of flow and appearance. In Computer vision–ECCV 2006 (pp. 428–441) Berlin: Springer.
Davis, J. W., & Sharma, V. (2004). Robust background-subtraction for person detection in thermal imagery. In IEEE international workshop on object tracking and classification beyond the visible spectrum.
Davis, J. W., & Sharma, V. (2007). Background-subtraction using contour-based fusion of thermal and visible imagery. Computer Vision and Image Understanding, 106(2), 162–182.
Article Google Scholar
Everingham, M., Van Gool, L., Williams, C. K. I., Winn, J., & Zisserman, A. (2012). The PASCAL visual object classes challenge 2012 results. See http://www.pascal-network.org/challenges/VOC/voc2012/workshop/index.html.
Fanelli, G., Dantone, M., Gall, J., Fossati, A., & Van Gool, L. (2013). Random forests for real time 3D face analysis. International Journal of Computer Vision, 101(3), 437–458.
Article Google Scholar
Farnebäck, G. (2003). Two-frame motion estimation based on polynomial expansion. In Image analysis (pp. 363–370) Berlin: Springer.
Felzenszwalb, P. F., Girshick, R. B., McAllester, D., & Ramanan, D. (2010). Object detection with discriminatively trained part-based models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(9), 1627–1645.
Article Google Scholar
Fernández-Caballero, A., Castillo, J. C., Serrano-Cuerda, J., & Maldonado-Bascón, S. (2011). Real-time human segmentation in infrared videos. Expert Systems with Applications, 38(3), 2577–2584.
Article Google Scholar
Fernández-Sánchez, E. J., Díaz, J., & Ros, E. (2013). Background subtraction based on color and depth using active sensors. Sensors, 13(7), 8895–8915.
Article Google Scholar
Fidler, S., Mottaghi, R., Yuille, A., & Urtasun, R. (2013). Bottom-up segmentation for top-down detection. In 2013 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 3294–3301).
Gade, R., & Moeslund, T. B. (2014). Thermal cameras and applications: A survey. Machine Vision and Applications, 25(1), 245–262.
Article Google Scholar
Gade, R., Jorgensen, A., & Moeslund, T. B. (2013). Long-term occupancy analysis using graph-based optimisation in thermal imagery. In 2013 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 3698–3705).
Giordano, D., Palazzo, S., & Spampinato, C. (2014). Kernel density estimation using joint spatial-color-depth data for background modeling. In 2014 22nd international conference on pattern recognition (ICPR) (pp. 4388–4393). Piscataway: IEEE.
Girshick, R. B., Felzenszwalb, P. F., & Mcallester, D.A. (2011). Object detection with grammar models. In Advances in neural information processing systems (pp. 442–450).
Gordon, G., Darrell, T., Harville, M., & Woodfill, J. (1999). Background estimation and removal based on range and color. In IEEE computer society conference on computer vision and pattern recognition, 1999 (Vol. 2).
Gulshan, V., Lempitsky, V., & Zisserman, A. (2011). Humanising grabCut: learning to segment humans using the Kinect. In 2011 IEEE International conference on computer vision workshops (ICCV workshops) (pp. 1127–1133).
Hernández-Vela, A., Bautista, M. A., Perez-Sala, X., Ponce, V., Baró, X., Pujol, O., et al. (2012a). BoVDW: Bag-of-Visual-and-Depth-Words for gesture recognition. In 2012 21st International conference on pattern recognition (vICPR) (pp. 449–452). Piscataway: IEEE.
Hernández-Vela, A., Zlateva, N., Marinov, A., Reyes, M., Radeva, P., Dimov, D., Escalera, S. (2012b). Graph cuts optimization for multi-limb human segmentation in depth maps. In 2012 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 726–732).
Hg, R. I., Jasek, P., Rofidal, C., Nasrollahi, K., Moeslund, T. B., Tranchet, G., et al. (2012). An RGB-D database using Microsoft’s Kinect for Windows for face detection. In 2012 eighth international conference on signal image technology and internet based systems (SITIS) (pp. 42–46). Piscataway: IEEE.
Holt, B., Ong, E.J., Cooper, H., & Bowden, R. (2011). Putting the pieces together: Connected poselets for human pose estimation. In 2011 IEEE international conference on computer vision workshops (ICCV workshops) (pp. 1196–1201).
Huynh, T., Min, R., & Dugelay, J. L. (2013). An efficient LBP-based descriptor for facial depth images applied to gender recognition using RGB-D face data. In Computer vision-ACCV 2012 workshops (pp. 133–145). Berlin: Springer.
Irani, R., Nasrollahi, K., Oliu, M., Corneanu, C., Escalera, S., Bahnsen, C., Lundtoft, D., Moeslund, T. B., Pedersen, T., Klitgaa, M.L., & Petrini, L. (2015). Spatiotemporal analysis of rgb-d-t facial images for multi-modal pain level recognition. In IEEE conference on computer vision and pattern recognition workshop.
Koppula, H. S., Gupta, R., & Saxena, A. (2013). Learning human activities and object affordances from RGB-D videos. The International Journal of Robotics Research, 32(8), 951–970.
Article Google Scholar
Kumar, M.P., Ton, P. H. S., & Zisserman, A. (2005). Obj cut. In IEEE computer society conference on computer vision and pattern recognition, 2005 (CVPR 2005) (Vol. 1, pp. 18–25).
Ladický, L., Sturgess, P., Alahari, K., Russell, C., & Torr, P. H. S. (2010). What, where and how many? combining object detectors and crfs. In Computer vision–ECCV 2010 (pp. 424–437) Berlin: Springer.
Leibe, B., Leonardis, A., & Schiele, B. (2004). Combined object categorization and segmentation with an implicit shape model. In Workshop on statistical learning in computer vision, ECCV (Vol. 2, p. 7).
Leibe, B., Leonardis, A., & Schiele, B. (2008). Robust object detection with interleaved categorization and segmentation. International Journal of Computer Vision, 77(1–3), 259–289.
Article Google Scholar
Levin, A., & Weiss, Y. (2006). Learning to combine bottom-up and top-down segmentation. In Computer vision–ECCV 2006 (pp. 581–594). Berlin: Springer.
Leykin, A., & Hammoud, R. (2006). Robust multi-pedestrian tracking in thermal-visible surveillance videos. In IEEE conference on computer vision and pattern recognition workshop 2006. (CVPRW’06) (p. 136).
Leykin, A., Ran, Y., & Hammoud, R. (2007). Thermal-visible video fusion for moving target tracking and pedestrian classification. In IEEE conference on computer vision and pattern recognition, 2007. (CVPR’07) (pp. 1–8).
Lin, Z., Davis, L.S., Doermann, D., & DeMenthon, D. (2007). An interactive approach to pose-assisted and appearance-based segmentation of humans. In IEEE 11th international conference on computer vision, 2007 (ICCV 2007) (pp 1–8).
Lopes, O., Reyes, M., Escalera, S., & Gonzalez, J. (2014). Spherical blurred shape model for 3D object and pose recognition: Quantitative analysis and hci applications in smart environments.
Martin, D., Fowlkes, C., Tal, D., & Malik, J. (2001). A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In Proceedings of eighth IEEE international conference on computure vision, 2001 (ICCV 2001) (Vol. 2, pp. 416–423).
Mittal, A., Zhao, L., & Davis, L. S. (2003). Human body pose estimation using silhouette shape analysis. In Proceedings of IEEE conference on advanced video and signal based surveillance, 2003 (pp 263–270).
Moeslund, T. B. (2011). Visual analysis of humans: Looking at people. London: Springer.
Book Google Scholar
Møgelmose, A., Bahnsen, C., Moeslund, T., Clapés, A., & Escalera, S. (2013). Tri-modal person re-identification with rgb, depth and thermal features. In IEEE conference on computer vision and pattern recognition workshops (CVPRW), 2013 (pp. 301–307). doi:10.1109/CVPRW.2013.52.
Mori, G., Ren, X., Efros, A. A., & Malik, J. (2004). Recovering human body configurations: combining segmentation and recognition. In Proceedings of the 2004 IEEE computer society conference on computer vision and pattern recognition, 2004 (CVPR 2004) (Vol. 2, pp. II-326).
Nghiem, A.T., Bremond, F., Thonnat, M., & Valentin, V. (2007). ETISEO, performance evaluation for video surveillance systems. In IEEE conference on advanced video signal based surveillance, 2007 (AVSS 2007) (pp. 476–481).
Nikisins, O., Nasrollahi, K., Greitans, M., & Moeslund, T. (2014). Rgb-d-t based face recognition. In 2014 22nd international conference on pattern recognition (ICPR) (pp. 1716–1721).
Olmeda, D., de la Escalera, A., & Armingol, J. M. (2012). Contrast invariant features for human detection in far infrared images. In 2012 IEEE on Intelligent Vehicles Symposium (IV) (pp. 117–122).
Oreifej, O., Liu, Z. (2013). Hon4d: Histogram of oriented 4d normals for activity recognition from depth sequences. In 2013 IEEE conference on computer vision and pattern recognition (CVPR). (pp. 716–723).
Otsu, N. (1975). A threshold selection method from gray-level histograms. Automatica, 11(285–296), 23–27.
Google Scholar
Pirsiavash, H., Ramanan, D. (2012). Steerable part models. In 2012 IEEE conference on computer vision and pattern recognition (CVPR). (pp. 3226–3233).
Plagemann, C., Ganapathi, V., Koller, D., Thrun, S. (2010). Real-time identification and localization of body parts from depth images. In 2010 IEEE international conference on robotics and automation (ICRA). (pp. 3108–3113).
Poppe, R. (2010). A survey on vision-based human action recognition. Image and Vision Computing, 28, 976–990. doi:10.1016/j.imavis.2009.11.014.
Article Google Scholar
Puertas, E., Escalera, S., Pujol, O. (2013). Generalized multi-scale stacked sequential learning for multi-class classification. Pattern Analysis and Applications, 1–15
Pugeault, N., Bowden, R. (2011). Spelling it out: Real-time asl fingerspelling recognition. In 2011 IEEE International conference on computer vision workshops (ICCV workshops). (pp. 1114–1119).
Ramanan, D. (2006). Learning to parse images of articulated bodies. In Advances in neural information processing systems. (pp. 1129–1136).
Rother, C., Kolmogorov, V., Blake, A. (2004). Grabcut: interactive foreground extraction using iterated graph cuts. In ACM transactions on graphics (TOG). (Vol. 23, pp. 309–314). ACM.
Scharwächter, T., Enzweiler, M., Franke, U., Roth, S. (2013). Efficient multi-cue scene segmentation. In Pattern Recognition. (pp. 435–445).
Schwarz, L.A., Mkhitaryan, A., Mateus, D., Navab, N. (2011). Estimating human 3D pose from time-of-flight images based on geodesic distances and optical flow. In 2011 IEEE International Conference on Automatic Face& Gesture Recognition and Workshops (FG 2011). (pp. 700–706).
Sheasby, G., Warrell, J., Zhang, Y., Crook, N., Torr, P.H.S. (2012). Simultaneous human segmentation, depth and pose estimation via dual decomposition. In British Machine Vision Conference, Student Workshop, BMVW.
Sheasby, G., Valentin, J., Crook, N., Torr, P. (2013). A robust stereo prior for human segmentation. In Computer Vision–ACCV 2012. (pp 94–107). Berlin: Springer.
Shi, J., & Malik, J. (2000). Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8), 888–905.
Article Google Scholar
Shotton, J., Fitzgibbon, A., Cook, M., Sharp, T., Finocchio, M., Moore, R., Kipman, A., Blake, A. (2011). Real-time human pose recognition in parts from single depth images. In Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition. (CVPR ’11). (pp. 1297–1304). Washington, DC: IEEE Computer Society. doi:10.1109/CVPR.2011.5995316
Spinello, L., Arras, K.O. (2011). People detection in RGB-D data. In 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). (pp. 3838–3843).
Stauffer, C., Grimson, W.E.L. (1999). Adaptive background mixture models for real-time tracking. In IEEE Compututer Society Conference on Computer Vision and Pattern Recognition, 1999. (Vol. 2)
Stefańczyk, M., & Kasprzak, W. (2012). Multimodal segmentation of dense depth maps and associated color information. In Computer vision and graphics. (pp. 626–632). Berlin: Springer.
Suard, F., Rakotomamonjy, A., Bensrhair, A., Broggi, A. (2006). Pedestrian detection using infrared images and histograms of oriented gradients. In Intelligent Vehicles Symposium, 2006 IEEE. (pp. 206–212).
Susperregi, L., Martínez-Otzeta, J.M., Ansuategui, A., Ibarguren, A., Sierra, B. (2013). RGB-D, laser and thermal sensor fusion for people following in a mobile robot. International Journal of Advanced Robotic Systems, 10.
Teichman, A., & Thrun, S. (2013). Learning to segment and track in RGB-D. Algorithmic Found (pp. 575–590). Robot. X: Springer.
Vidas, S., Lakemond, R., Denman, S., Fookes, C., Sridharan, S., & Wark, T. (2012). A mask-based approach for the geometric calibration of thermal-infrared cameras. IEEE Transactions on Instrumentation and Measurement, 61(6), 1625–1635.
Article Google Scholar
Vineet, V., Sheasby, G., Warrell, J., & Torr, P. H. S. (2013). PoseField: An efficient mean-field based method for joint estimation of human pose, segmentation, and depth. In Energy minimization methods in computer vision and pattern recognition. (pp. 180–194). Berlin: Springer.
Viola, P., Jones, M. J., & Snow, D. (2005). Detecting pedestrians using patterns of motion and appearance. International Journal of Computer Vision, 63(2), 153–161.
Article Google Scholar
Wang, L., Qiao, Y., Tang, X. (2013). Motionlets: mid-level 3D parts for human motion recognition. In 2013 IEEE conference on computer vision and pattern recognition (CVPR). (pp. 2674–2681).
Wang, W., Zhang, J., Shen, C. (2010). Improved human detection and classification in thermal images. In 2010 17th IEEE International Conference on Image Processing (ICIP). (pp. 2313–2316).
Wang, Y., Tran, D., Liao, Z. (2011). Learning hierarchical poselets for human parsing. In 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). (pp. 1705–1712).
Windheuser, T., Schlickewei, U., Schmidt, F.R., Cremers, D. (2011). Geometrically consistent elastic matching of 3D shapes: a linear programming solution. In 2011 IEEE international conference on computer vision (ICCV). (pp. 2134–2141).
Wolf, C., Mille, J., Lombardi, E., Celiktutan, O., Jiu, M., Baccouche, M., Dellandréa, E., Bichot, C.E., Garcia, C., Sankur, B. (2012). The LIRIS human activities dataset and the ICPR 2012 human activities recognition and localization competition. In LIRIS Umr 5205 CNRS/INSA Lyon/Universite’Claude Bernard Lyon 1/Universite’Lumie ‘re Lyon 2/E’cole Cent.
Xia, L., Chen, C.C., Aggarwal, J.K. (2011). Human detection using depth information by kinect. In 2011 IEEE computer society conference on computer vision and pattern recognition workshops (CVPRW). (pp. 15–22).
Yang, Y., Ramanan, D. (2011). Articulated pose estimation with flexible mixtures-of-parts. In 2011 IEEE conference on computer vision and pattern recognition (CVPR). (pp. 1385–1392).
Yang, Y., & Ramanan, D. (2013). Articulated human detection with flexible mixtures of parts. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(12), 2878–2890.
Article Google Scholar
Yao, B., Fei-Fei, L. (2010). Grouplet: a structured image representation for recognizing human and object interactions. In 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). (pp. 9–16).
Zhang, L., Wu, B., Nevatia, R. (2007). Pedestrian detection in infrared images based on local shape features. In IEEE Conference on Computer Vision and Pattern Recognition, 2007. (CVPR’07). (pp. 1–8).
Zhao, J., Sen-ching, S.C. (2012). Human segmentation by geometrically fusing visible-light and thermal imageries. Multimedia Tools and Applications, 1–29.
Zhu, L., Chen, Y., Lu, Y., Lin, C., Yuille, A. (2008). Max margin and/or graph learning for parsing the human body. In IEEE conference on computer vision and pattern recognition, 2008. (CVPR 2008). (pp. 1–8).
Zivkovic, Z. (2004). Improved adaptive Gaussian mixture model for background subtraction. In Proceedings of the 17th international conference on pattern recognition, 2004. (ICPR 2004). (Vol. 2, pp 28–31).

Download references

Acknowledgments

This work was partly supported by the Spanish Project TIN2013-43478-P. The work of Albert Clapés was supported by SUR-DEC of the Generalitat de Catalunya and FSE. We would like to thank Anders Jørgensen for his valuable help in capturing the dataset.

Author information

Authors and Affiliations

Dept. Matemàtica Aplicada i Anàlisi, UB, Gran Via de les Corts Catalanes 585, 08007, Barcelona, Spain
Cristina Palmero, Albert Clapés & Sergio Escalera
Computer Vision Center, Campus UAB, Edifici O, 08193, Cerdanyola del Vallès, Spain
Cristina Palmero, Albert Clapés & Sergio Escalera
Aalborg University, Sofiendalsvej 11, 9200, Aalborg SV, Denmark
Chris Bahnsen, Andreas Møgelmose & Thomas B. Moeslund

Authors

Cristina Palmero
View author publications
You can also search for this author in PubMed Google Scholar
Albert Clapés
View author publications
You can also search for this author in PubMed Google Scholar
Chris Bahnsen
View author publications
You can also search for this author in PubMed Google Scholar
Andreas Møgelmose
View author publications
You can also search for this author in PubMed Google Scholar
Thomas B. Moeslund
View author publications
You can also search for this author in PubMed Google Scholar
Sergio Escalera
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Cristina Palmero.

Additional information

Communicated by Junsong Yuan, Wanqing Li, Zhengyou Zhang, David Fleet, Jamie Shotton.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (mp4 27017 KB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Palmero, C., Clapés, A., Bahnsen, C. et al. Multi-modal RGB–Depth–Thermal Human Body Segmentation. Int J Comput Vis 118, 217–239 (2016). https://doi.org/10.1007/s11263-016-0901-x

Download citation

Received: 11 November 2014
Accepted: 14 March 2016
Published: 13 April 2016
Issue Date: June 2016
DOI: https://doi.org/10.1007/s11263-016-0901-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-modal RGB–Depth–Thermal Human Body Segmentation

Abstract

Access this article

Similar content being viewed by others

Learning to Segment Humans by Stacking Their Body Parts

Reconstructing Articulated Rigged Models from RGB-D Videos

KinectAvatar: Fully Automatic Body Capture Using a Single Kinect

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic supplementary material

Supplementary material 1 (mp4 27017 KB)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Multi-modal RGB–Depth–Thermal Human Body Segmentation

Abstract

Access this article

Similar content being viewed by others

Learning to Segment Humans by Stacking Their Body Parts

Reconstructing Articulated Rigged Models from RGB-D Videos

KinectAvatar: Fully Automatic Body Capture Using a Single Kinect

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic supplementary material

Supplementary material 1 (mp4 27017 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation