Skip to main content

Bottom-Up Processing in Complex Scenes: A Unifying Perspective on Segmentation, Fixation Saliency, Candidate Regions, Base-Detail Decomposition, and Image Enhancement

  • Chapter
Recent Progress in Brain and Cognitive Engineering

Part of the book series: Trends in Augmentation of Human Performance ((TAHP,volume 5))

Abstract

Early visual processing should offer efficient bottom-up mechanisms aiming to simplify visual information, enhance it, and direct attention to make high-level processing more efficient. Based on these considerations, we propose a unified approach which addresses a set of fundamental early visual processes: segmentation, candidate regions, base-detail decomposition, image enhancement, and saliency for fixations prediction. We argue that for complex scenes all these processes require hierarchical segmentwise processing. Furthermore, we argue that some of these visual tasks require the ability to decompose the appearance of the segments into “base” appearance and “detail” appearance. An important, and surprising, result of this decomposition is a novel method for successfully predicting human eye fixations. Our hypothesis is that we fixate on segments that are not easy to model, e.g., are small but have a lot of detail, in order to obtain a higher resolution representation for further analysis. We show performances on psychophysics data on the Pascal VOC dataset, whose images are non-iconic and particularly difficult for the state-of-the-art saliency algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Achanta R, Shaji A, Smith K, Lucchi A, Fua P, Susstrunk S (2012) SLIC superpixels compared to state-of-the-art superpixel methods. TPAMI 34(11):2274–2282

    Article  Google Scholar 

  2. Alexe B, Deselaers T, Ferrari V (2012) Measuring the objectness of image windows. TPAMI 34(11):2189–2202

    Article  Google Scholar 

  3. Alpert S, Galun M, Brandt A, Basri R (2012) Image segmentation by probabilistic bottom-up aggregation and cue integration. TPAMI 34(2):315–327

    Article  Google Scholar 

  4. Arbelaez P (2006) Boundary extraction in natural images using ultrametric contour maps. In: Proceedings of the 2006 conference on computer vision and pattern recognition workshop, CVPRW ’06. IEEE Computer Society, Washington, DC, pp 182–

    Google Scholar 

  5. Arbelaez P, Maire M, Fowlkes C, Malik J (2011) Contour detection and hierarchical image segmentation. TPAMI 33(5):898–916

    Article  Google Scholar 

  6. Arbelaez P, Hariharan B, Gu C, Gupta S, Malik J (2012) Semantic segmentation using regions and parts. In: CVPR, Providence

    Book  Google Scholar 

  7. Bae S, Paris S, Durand F (2006) Two-scale tone management for photographic look. ACM Trans Graph 25(3):637–645

    Article  Google Scholar 

  8. Barron JT, Malik J (2012) Color constancy, intrinsic images, and shape estimation. In: ECCV, Florence

    Book  Google Scholar 

  9. Barrow HG, Tenenbaum JM (1978) Recovering intrinsic scene characteristics from images. Technical report 157, AI Center, SRI International, 333 Ravenswood Ave., Menlo Park, CA 94025

    Google Scholar 

  10. Bonev B, Yuille AL (2014) A fast and simple algorithm for producing candidate regions. In: European conference on computer vision (ECCV 2014), Zurich

    Google Scholar 

  11. Borji A, Itti L (2013) State-of-the-art in visual attention modeling. IEEE Trans Pattern Anal Mach Intell 35(1):185–207

    Article  PubMed  Google Scholar 

  12. Borji A, Sihite DN, Itti L (2013) Objects do not predict fixations better than early saliency: a re-analysis of Einhäuser et al.’s data. J Vis 13(10):18

    Google Scholar 

  13. Borji A, Cheng M, Jiang H, Li J (2014) Salient object detection: a survey. CoRR, abs/1411.5878

    Google Scholar 

  14. Bradley C, Abrams J, Geisler WS (2014) Retina-v1 model of detectability across the visual field. J Vis 14(12):22

    Article  PubMed Central  PubMed  Google Scholar 

  15. Carreira J, Sminchisescu C (2012) CPMC: automatic object segmentation using constrained parametric min-cuts. TPAMI 34(7):1312–1328

    Article  CAS  Google Scholar 

  16. Einhäuser W, Spain M, Perona P (2008) Objects predict fixations better than early saliency. J Vis 8(14):18

    Article  PubMed  Google Scholar 

  17. Everingham M, Van Gool L, Williams CK, Winn J, Zisserman A (2010) The PASCAL visual object classes (VOC) challenge. Int J Comput Vis 88(2):303–338

    Article  Google Scholar 

  18. Farbman Z, Fattal R, Lischinski D, Szeliski R (2008) Edge-preserving decompositions for multi-scale tone and detail manipulation. ACM Trans Graph 27(3):67:1–67:10

    Google Scholar 

  19. Felzenszwalb PF, Huttenlocher DP (2004) Efficient graph-based image segmentation. IJCV 59(2):167–181

    Article  Google Scholar 

  20. Galun M, Sharon E, Basri R, Brandt A (2003) Texture segmentation by multiscale aggregation of filter responses and shape elements. In: ICCV ’03, Nice, pp 716–

    Google Scholar 

  21. Garcia-Diaz A, Leborán V, Fdez-Vidal XR, Pardo XM (2012) On the relationship between optical variability, visual saliency, and eye fixations: a computational approach. J Vis 12(6):1–22

    Article  Google Scholar 

  22. Geman S, Geman D (1984) Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Trans Pattern Anal Mach Intell 6(6):721–741

    Article  CAS  PubMed  Google Scholar 

  23. Gollisch T, Meister M (2010) Eye smarter than scientists believed: neural computations in circuits of the retina. Neuron 65(2):150–164

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  24. Golub GH, Van Loan CF (2012) Matrix computations, vol 3. JHU Press, Baltimore

    Google Scholar 

  25. Gonzalez RC, Woods RE, Eddins SL (2004) Digital image processing using matlab. Pearson Prentice Hall, Upper Saddle River

    Google Scholar 

  26. Gorelick L, Basri R (2009) Shape based detection and top-down delineation using image segments. Int J Comput Vis 83(3):211–232

    Article  Google Scholar 

  27. Horn BKP, Brooks MJ (1986) The variational approach to shape from shading. Comput Vis Graph Image Process 33(2):174–208

    Article  Google Scholar 

  28. Hou X, Harel J, Koch C (2012) Image signature: highlighting sparse salient regions. IEEE TPAMI 34(1):194–201

    Article  Google Scholar 

  29. Humayun A, Li F, Rehg JM (2014) RIGOR: reusing inference in graph cuts for generating object regions. In: Proceedings of IEEE conference on computer vision and pattern recognition (CVPR), Columbus. IEEE, New York

    Google Scholar 

  30. Itti L, Koch C, Niebur E (1998) A model of saliency-based visual attention for rapid scene analysis. IEEE TPAMI 20(11):1254–1259

    Article  Google Scholar 

  31. Judd T, Ehinger K, Durand F, Torralba A (2009) Learning to predict where humans look. In: ICCV, Kyoto, pp 2106–2113. IEEE, New York

    Google Scholar 

  32. Land EH (1977) The retinex theory of color vision. Sci Am 237(6):108–28

    Article  CAS  PubMed  Google Scholar 

  33. Leclerc YG (1989) Image and boundary segmentation via minimal-length encoding on the connection machine. In: Proceedings of a workshop on image understanding workshop, Palo Alto. Morgan Kaufmann, San Francisco, pp 1056–1069. ISBN 1-55860-070-1. http://dl.acm.org/citation.cfm?id=94703.99744

  34. LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86:2278–2324

    Article  Google Scholar 

  35. Leonenko N, Pronzato L, Savani V (2008) A class of Rényi information estimators for multidimensional densities. Ann Statist 36(5):2153–2182

    Article  Google Scholar 

  36. Li J, Levine M, An X, He H (2011) Saliency detection based on frequency and spatial domain analyses. In: Proceedings of BMVC, Dundee, pp 86.1–86.11. http://dx.doi.org/10.5244/C.25.86 http://dx.doi.org/10.5244/ C.25.86 http://dx.doi.org/10.5244/C.25.86

  37. Li J, Levine MD, An X, Xu X, He H (2013) Visual saliency based on scale-space analysis in the frequency domain. IEEE Trans Pattern Anal Mach Intell 35(4):996–1010

    Article  PubMed  Google Scholar 

  38. Li Y, Hou X, Koch C, Rehg JM, Yuille AL (2014) The secrets of salient object segmentation. In: CVPR, Columbus

    Book  Google Scholar 

  39. Marr D (1982) Vision: a computational investigation into the human representation and processing of visual information. Henry Holt and Co., New York

    Google Scholar 

  40. Mottaghi R, Chen X, Liu X, Fidler S, Urtasun R, Yuille A (2014) The role of context for object detection and semantic segmentation in the wild. In: CVPR, Columbus

    Book  Google Scholar 

  41. Russ JC, Woods RP (1995) The image processing handbook. J Comput Assist Tomogr 19(6):979–981

    Article  Google Scholar 

  42. Shapley R, Enroth-Cugell C (1984) Visual adaptation and retinal gain controls. Prog Retin Res 3:263–346

    Article  Google Scholar 

  43. Todorovic S, Ahuja N (2008) Region-based hierarchical image matching. IJCV 78(1):47–66

    Article  Google Scholar 

  44. Tomasi C, Manduchi R (1998) Bilateral filtering for gray and color images. In: Sixth international conference on computer vision, 1998. IEEE, Washington, DC, pp 839–846

    Google Scholar 

  45. Tu Z, Zhu S-C, Shum H-Y (2001) Image segmentation by data driven Markov chain Monte Carlo. In: Proceedings of eighth IEEE international conference on computer vision, 2001. ICCV 2001, Vancouver, vol 2, pp 131–138

    Google Scholar 

  46. Uijlings JRR, van de Sande KEA, Gevers T, Smeulders AWM (2013) Selective search for object recognition. Int J Comput Vis 104(2):154–171

    Article  Google Scholar 

  47. Woodham RJ (1980) Photometric method for determining surface orientation from multiple images. Opt Eng 19(1):191139–191139

    Article  Google Scholar 

  48. Xu C, Xiong C, Corso JJ (2012) Streaming hierarchical video segmentation. In: ECCV, Florence

    Book  Google Scholar 

  49. Yuan L, Sun J (2012) Automatic exposure correction of consumer photographs. In: Fitzgibbon AW, Lazebnik S, Perona P, Sato Y, Schmid C (eds) ECCV (4). Volume 7575 of Lecture notes in computer science. Springer, Berlin/New York, pp 771–785

    Google Scholar 

  50. Zhaoping L (2003) V1 mechanisms and some figure-ground and border effects. J Physiol 97(1):503–515

    Google Scholar 

  51. Zhaoping L (2014) Understanding vision: theory, models, and data. Oxford University Press, Oxford

    Book  Google Scholar 

  52. Zhu SC, Yuille A (1996) Region competition: unifying snakes, region growing, and Bayes/MDL for multiband image segmentation. IEEE Trans Pattern Anal Mach Intell 18(9):884–900

    Article  Google Scholar 

  53. Zhu L, Chen Y, Lin Y, Lin C, Yuille A (2012) Recursive segmentation and recognition templates for image parsing. IEEE Trans Pattern Anal Mach Intell 34(2):359–371

    Article  PubMed  Google Scholar 

  54. Zhu Y, Zhang Y, Yuille A (2014) Single image super-resolution using deformable patches. In: 2014 IEEE conference on computer vision and pattern recognition (CVPR), Columbus, pp 2917–2924

    Google Scholar 

  55. Zitnick CL, Dollár P (2014) Edge boxes: locating object proposals from edges. In: ECCV, Zurich

    Google Scholar 

Download references

Acknowledgements

We would like to thank Laurent Itti, Li Zhaoping, John Flynn, and the reviewers for their valuable comments. This work is partially supported by NSF award CCF-1317376, by ONR N00014-12-1-0883 and by NVidia Corp.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Boyan Bonev .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer Science+Business Media Dordrecht

About this chapter

Cite this chapter

Bonev, B., Yuille, A.L. (2015). Bottom-Up Processing in Complex Scenes: A Unifying Perspective on Segmentation, Fixation Saliency, Candidate Regions, Base-Detail Decomposition, and Image Enhancement. In: Lee, SW., Bülthoff, H., Müller, KR. (eds) Recent Progress in Brain and Cognitive Engineering. Trends in Augmentation of Human Performance, vol 5. Springer, Dordrecht. https://doi.org/10.1007/978-94-017-7239-6_8

Download citation

Publish with us

Policies and ethics