Skip to main content

Advertisement

Log in

Scene Understanding by Reasoning Stability and Safety

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

This paper presents a new perspective for 3D scene understanding by reasoning object stability and safety using intuitive mechanics. Our approach utilizes a simple observation that, by human design, objects in static scenes should be stable in the gravity field and be safe with respect to various physical disturbances such as human activities. This assumption is applicable to all scene categories and poses useful constraints for the plausible interpretations (parses) in scene understanding. Given a 3D point cloud captured for a static scene by depth cameras, our method consists of three steps: (i) recovering solid 3D volumetric primitives from voxels; (ii) reasoning stability by grouping the unstable primitives to physically stable objects by optimizing the stability and the scene prior; and (iii) reasoning safety by evaluating the physical risks for objects under physical disturbances, such as human activity, wind or earthquakes. We adopt a novel intuitive physics model and represent the energy landscape of each primitive and object in the scene by a disconnectivity graph (DG). We construct a contact graph with nodes being 3D volumetric primitives and edges representing the supporting relations. Then we adopt a Swendson–Wang Cuts algorithm to partition the contact graph into groups, each of which is a stable object. In order to detect unsafe objects in a static scene, our method further infers hidden and situated causes (disturbances) in the scene, and then introduces intuitive physical mechanics to predict possible effects (e.g., falls) as consequences of the disturbances. In experiments, we demonstrate that the algorithm achieves a substantially better performance for (i) object segmentation, (ii) 3D volumetric recovery, and (iii) scene understanding with respect to other state-of-the-art methods. We also compare the safety prediction from the intuitive mechanics model with human judgement.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20

Similar content being viewed by others

References

  • Anand, A., Koppula, H., Joachims, T., & Saxena, A. (2012). Contextually guided semantic labeling and search for 3d point clouds. In IJRR.

  • Attene, M., Falcidieno, B., & Spagnuolo, M. (2006). Hierarchical mesh segmentation based on fitting primitives. The Visual Computer, 22, 181–193.

    Article  Google Scholar 

  • Barbu, A., & Zhu, S. C. (2005). Generalizing Swendsen–Wang to sampling arbitrary posterior probabilities. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27, 1239–1253.

    Article  Google Scholar 

  • Biederman, I., Mezzanotte, R. J., & Rabinowitz, J. C. (1982). Scene perception: Detecting and judging objects undergoing relational violations. Cognitive Psychology, 14(2), 143–177.

    Article  Google Scholar 

  • Blane, M., Lei, Z. B., & Cooper, D. B. (2000). The 3L algorithm for fitting implicit polynomial curves and surfaces to data. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(3), 298–313.

    Article  Google Scholar 

  • Chen, X., Golovinskiy, A., & Funkhouser, T. (2009). A benchmark for 3D mesh segmentation. In SIGGRAPH.

  • DARPA. (2014). Robots rescue people. http://www.i-programmer.info/news/169-robotics/6857-robots-rescue-people.html.

  • Delaitre, V., Fouhey, D., Laptev, I., Sivic, J., Gupta, A., & Efros, A. (2012). Scene semantics from long-term observation of people. In ECCV.

  • Felzenszwalb, P. F., & Huttenlocher, D. P. (2004). Efficient graph-based image segmentation. International Journal of Computer Vision, 59(2), 167–181.

    Article  Google Scholar 

  • Fleming, R., Barnett-Cowan, M., & Bülthoff, H. (2010). Perceived object stability is affected by the internal representation of gravity. Perception, 39, 109.

    Google Scholar 

  • Fouhey, D., Delaitre, V., Gupta, A., Efros, A., Laptev, I., & Sivic, J. (2012). People watching: Human actions as a cue for single-view geometry. In ECCV.

  • Furukawa, Y., Curless, B., Seitz, S. M., & Szeliski, R. (2009). Manhattan-world stereo. In CVPR.

  • Grabner, H., Gall, J., & Van, G. L. (2011). What makes a chair a chair? In CVPR.

  • Guo, R., & Hoiem, D. (2013). Support surface prediction in indoor scenes. In ICCV.

  • Gupta, A., Efros, A., & Hebert, M. (2010). Blocks world revisited: Image understanding using qualitative geometry and mechanics. In ECCV.

  • Gupta, A., Satkin, S., Efros, A., & Hebert, M. (2011). From 3D scene geometry to human workspace. In CVPR.

  • Hamrick, J., Battaglia, P., & Tenenbaum, J. (2011). Internal physics models guide probabilistic judgments about object dynamics. In Proceedings of the 33rd Annual Meeting of the Cognitive Science Society.

  • Hedau, V., Hoiem, D., & Forsyth, D. (2010). Thinking inside the box: Using appearance models and context based on room geometry. In ECCV.

  • Janoch, A., Karayev, S., Jia, Y., Barron, J. T., Fritz, M., Saenko, K., & Darrell, T. (2011). A category-level 3-d object dataset: Putting the kinect to work. In ICCV workshop.

  • Jia, Z., Gallagher, A., Saxena, A., & Chen, T. (2013). 3d-based reasoning with blocks, support, and stability. In CVPR.

  • Jiang, Y., & Saxena, A. (2013). Infinite latent conditional random fields for modeling environments through humans. In Robotics: Science and Systems (RSS).

  • Jiang, Y., Koppula, H.S., & Saxena, A. (2013). Hallucinated humans as the hidden context for labeling 3d scenes. In: CVPR.

  • Karpathy, A., Miller, S., & Fei-Fei, L. (2013). Object discovery in 3d scenes via shape analysis. In International Conference on Robotics and Automation (ICRA).

  • Koppula, H., Anand, A., Joachims, T., & Saxena, A. (2011). Semantic labeling of 3d point clouds for indoor scenes. In NIPS.

  • Kriegman, D. J. (1995). Let them fall where they may: Capture regions of curved objects and polyhedra. International Journal of Robotics Research, 16, 448–472.

    Article  Google Scholar 

  • Lee, D., Hebert, M., & Kanade, T. (2009). Geometric reasoning for single image structure recovery. In CVPR.

  • Lee, D., Gupta, A., Hebert, M., & Kanade, T. (2010). Estimating spatial layout of rooms using volumetric reasoning about objects and surfaces advances in neural information processing systems. Cambridge: MIT.

    Google Scholar 

  • McCloskey, M. (1983). Intuitive physics. Scientific American, 248(4), 114–122.

  • Nan, L., Xie, K., & Sharf, A. (2012). A search-classify approach for cluttered indoor scene understanding. ACM Transactions on Graphics (TOG), 31(6), 137.

    Article  Google Scholar 

  • Newcombe, R., Izadi, S., Hilliges, O., Molyneaux, D., Kim, D., Davison, A., Kohli, P., Shotton, J., Hodges, S., & Fitzgibbon, A. (2011). Kinectfusion: Real-time dense surface mapping and tracking. In ISMAR.

  • Petti, S., & Fraichard, T. (2005). Safe motion planning in dynamic environments. In IROS.

  • Phillips, M., & Likhachev, M. (2011). Sipp: Safe interval path planning for dynamic environments. In ICRA.

  • Poppinga, J., Vaskevicius, N., Birk, A., & Pathak, K. (2008). Fast plane detection and polygonalization in noisy 3D range images. In IROS.

  • Sagawa, R., Nishino, K., & Ikeuchi, K. (2005). Adaptively merging large-scale range data with reflectance properties. IEEE Transaction on Pattern Analysis and Machine Intelligence, 27, 392–405.

    Article  Google Scholar 

  • Savva, M., Chang, A. X., Hanrahan, P., & Fisher, M. (2014). Scenegrok: Inferring action maps in 3d environments. ACM Transactions on Graphics (TOG), 33(6), 212.

    Article  Google Scholar 

  • Shao, T., Xu, W., Zhou, K., Wang, J., & Li, D. (2012). An interactive approach to semantic modeling of indoor scenes with an rgbd camera. ACM Transactions on Graphics (TOG), 31, 136.

    Google Scholar 

  • Shao, T., Monszpart, A., Zheng, Y., Koo, B., Ku, W., Zhou, K., et al. (2014). Imagining the unseen: Stability-based cuboid arrangements for scene understanding. ACM Transactions on Graphics (TOG), 33, 209.

  • Shi, Q. Y., & Ks, Fu. (1983). Parsing and translation of (attributed) expansive graph languages for scene analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 5(5), 472–485.

    Article  MATH  Google Scholar 

  • Silberman, N., Kohli, P., Hoiem, D. & Fergus, R. (2012). Indoor segmentation and support inference from RGBD images. In ECCV.

  • Tu, Z., Chen, X., Yuille, A. L., & Zhu, S. C. (2005). Image parsing: Unifying segmentation, detection, and recognition. International Journal of Computer Vision, 63, 113.

    Article  Google Scholar 

  • Wales, D. (2004). Energy landscapes: Applications to clusters, biomolecules and glasses. Cambridge: Cambridge Molecular Science, Cambridge University Press.

    Book  Google Scholar 

  • Wu, C., Lenz, I., & Saxena, A. (2014). Hierarchical semantic labeling for task-relevant rgb-d perception. In Robotics: Science and systems (RSS).

  • Zhao, Y., & Zhu, S. C. (2011). Image parsing via stochastic scene grammar. In NIPS.

  • Zheng, B., Takamatsu, J., & Ikeuchi, K. (2010). An adaptive and stable method for fitting implicit polynomial curves and surfaces. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(3), 561–568.

    Article  Google Scholar 

  • Zheng, B., Zhao, Y., Yu, J. C., Ikeuchi, K., & Zhu, S. C. (2013). Beyond point cloud: Scene understanding by reasoning geometry and physics. In CVPR.

  • Zheng, B., Zhao, Y., Yu, J. C., Ikeuchi, K., & Zhu, S. C. (2014). Detecting potential falling objects by inferring human action and natural disturbance. In IEEE international conference on robotics and automation (ICRA).

Download references

Acknowledgments

This work is supported by (1) MURI ONR N00014-10-1-0933 and DARPA MSEE grant FA 8650-11-1-7149, USA, (2) Next-generation Energies for Tohoku Recovery (NET) and SCOPE Program of Ministry of Internal Affairs and Communications, Japan, (3) and the 10-th core Project Grant of Microsoft Japan.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bo Zheng.

Additional information

Communicated by Derek Hoiem, James Hays, Jianxiong Xiao, and Aditya Khosla.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zheng, B., Zhao, Y., Yu, J. et al. Scene Understanding by Reasoning Stability and Safety. Int J Comput Vis 112, 221–238 (2015). https://doi.org/10.1007/s11263-014-0795-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-014-0795-4

Keywords

Navigation