Skip to main content
  • 994 Accesses

Abstract

This chapter\(^\dagger\) presents a generative framework that uses influence diagrams to fuse metadata of multiple modalities for photo annotation. We fuse contextual information (location, time, and camera parameters), visual content (holistic and local perceptual features), and semantic ontology in a synergistic way. We use causal strengths to encode causalities between variables, and between variables and semantic labels. Through analytical and empirical studies, we demonstrate that our fusion approach can achieve high-quality photo annotation and good interpretability, substantially better than traditional methods.

© ACM, 2005. This chapter is a minor revision of the author’s work with Yi Wu and Belle Tseng [1] published in MULTIMEDIA’05. Permission to publish this chapter is granted under copyright license #2587660180893.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    We use “network” and “graph” interchangeably to refer to “influence diagram.” The major difference between a network, a graph, and an influence diagram (which will become evident in Sect. 7.4.2) lies in how the weights of the edges are measured. Otherwise, an influence diagram or a probabilistic causal model under the assumption of the causal Markov condition is a Bayesian network [30].

  2. 2.

    In general, when two variables u and d are dependent, we cannot tell which causes which. For photo annotation, we can determine the direction of the arcs based on domain knowledge.

  3. 3.

    We changed the term \(P(u | \overline{d}, \xi)\) in [13] to \(P(u | \xi)\) in the formula, because \(\overline{d}\) could be interpreted as the negation (instead of absence) of d.

  4. 4.

    To conserve space, we draw the influence diagrams only using context and content features. Relationships between semantic labels can be found in Fig. 7.2.

References

  1. Y. Wu, E. Y Chang, B. L. Tseng, Multimodal metadata fusion using causal strength, in Proceedings of ACM Multimedia, pp 872–881, 2005

    Google Scholar 

  2. B.S. Manjunath, W.Y. Ma, Texture features for browsing and retrieval of image data. IEEE Trans. Pattern Anal. Mach. Intell. 18, 837–842 (1996)

    Article  Google Scholar 

  3. Y. Rui, T.S. Huang, S.F. Chang, Image retrieval: current techniques, promising directions and open issues. J. Vis. Commun. Image Represent. (1999)

    Google Scholar 

  4. D.G. Lowe, Object recognition from local scale-invariant features, in Proceedings of IEEE ICCV, pp. 1150–1157, 1999

    Google Scholar 

  5. D.G. Lowe, Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–100 (2004)

    Article  Google Scholar 

  6. M. Boutell, J. Luo, Bayesian fusion of camera metadata cues in semantic scene classification, in Proceedings of IEEE CVPR, pp. 623–630

    Google Scholar 

  7. M. Naaman, A. Paepcke, H. Garcia-Molina, From where to what: metadata sharing for digital photographs with geographic coordinates, in Proceedings of the International Conference on Cooperative Information Systems (CoopIS), pp. 196–217, 2003

    Google Scholar 

  8. E.Y. Chang, Extent: fusing context, content, and semantic ontology for photo annotation, in Proceedings of ACM Workshop on Computer Vision Meets Databases(CVDB) in conjunction with ACM SIGMOD, pp. 5–11, 2005

    Google Scholar 

  9. D. Heckerman, R. Shachter Decision-theoretic foundations for causal reasoning. Microsoft technical report MSR-TR-94-11 (1994)

    Google Scholar 

  10. D. Heckerman, A bayesian approach to learning causal networks, in Proceedings of the Conference on Uncertainty in Artificial Intelligence, pp. 107–118, 1995

    Google Scholar 

  11. J. Pearl, Causality: Models, Reasoning and Inference (Cambridge University Press, Cambridge, 2000)

    Google Scholar 

  12. J. Pearl, Causal inference in the health sciences: A conceptual introduction. Special issue on causal inference, Health Services and Outcomes Research Methodology, vol. 2, pp. 189–220 (Kluwer Academic Publishers, 2001)

    Google Scholar 

  13. L.R. Novick, P.W. Cheng, Assessing interactive causal influence. Psycholo. Rev. 111(2), 455–485 (2004)

    Article  Google Scholar 

  14. S. Tong, E. Chang, Support vector machine active learning for image retrieval, in Proceedings of ACM International Conference on Multimedia, pp. 107–118, October 2001

    Google Scholar 

  15. K. Barnard, D. Forsyth, Learning the Semantics of Words and Pictures. (2000), pp. 408–415

    Google Scholar 

  16. J.Z. Wang, J. Li, G. Wiederhold, Simplicity: semantics-sensitive integrated matching for picture libraries, in Proceedings of ACM Multimedia, pp. 483–484, 2000

    Google Scholar 

  17. M. Davis, S. King, N. Good, R. Sarvas, From context to content: leveraging context to infer media metadata, in Proceedings of the ACM International Conference on Multimedia, pp. 188–195, 2004

    Google Scholar 

  18. A.K. Dey, Understanding and using context. Pers. Ubiquitous Comput. J. 5(1), 4–7 (2001)

    Article  Google Scholar 

  19. D.S. Diomidis, Position-annotated photographs: a geotemporal web. IEEE Pervasive Comput. 2(2) (2003)

    Google Scholar 

  20. M. Naaman, S. Harada, Q. Wang, H. Garcia-Molina, A. Paepcke, Context data in geo-referenced digital photo collections, in Proceedings of ACM International Conference on Multimedia, pp. 196–203, 2004

    Google Scholar 

  21. R. Jain, P. Sinha, Content without context is meaningless, in Proceedings of ACM Multimedia, pp. 1259–1268, 2010

    Google Scholar 

  22. http://www.exif.org

  23. M. Stricker, M.Orengo, Similarity of color images, in Proceedings SPIE Storage and Retrieval for Image and Video Databases, 1995

    Google Scholar 

  24. J.R. Smith, S.F. Chang, Tools and techniques for color image retrieval, in SPIE Proceedings Storage and Retrieval for Image and Video Databases IV, 1995

    Google Scholar 

  25. Y. Rui, A.C. She, T.S. Huang, Modified fourier descriptors for shape representations- a practical approach, in Proceedings of First International Workshop on Image Databases and Multi Media Search, 1996

    Google Scholar 

  26. Y. Ke, R. Sukthankar, Pca-sift: a more distinctive representation for local image descriptors, in Proceedings of IEEE CVPR, 2004

    Google Scholar 

  27. L. Khan, D. McLeod, Effective retrieval of audio information from annotated text using ontologies, in Proceedings of Workshop of Multimedia Data Mining with ACM SIGKDD, pp. 37–45, 2000

    Google Scholar 

  28. J.R. Smith, S.F. Chang, Visually searching the web for content. IEEE Multimedia 4(3), 12–20 (1997)

    Article  Google Scholar 

  29. J. Deng, W. Dong, R. Socher, L. Li, K. Li, F.F. Li, Imagenet: a large-scale hierarchical image database, in Proceedings of IEEE CVPR, pp. 156–161, 2009

    Google Scholar 

  30. J. Williamson, Causality, in Handbook of Philosophical Logic, ed. by D. Gabbay, F. Guenthner (Kluwer, 2005)

    Google Scholar 

  31. D. Geiger, D. Heckerman, Knowledge representation and inference in similarity networks and bayesian multinets. Artif. Intell. 82, 45–74 (1996)

    Article  MathSciNet  Google Scholar 

  32. N. Friedman, D. Geiger, M. Goldszmidt, Bayesian network classifiers. Mach. Learn. 29, 131–161 (1997)

    Article  MATH  Google Scholar 

  33. E.B. Goldstein, Senstation and Perception 5th edn. (Wadsworth, Dordrecht, 1999)

    Google Scholar 

  34. N. Friedman, D. Koller, Learning bayesian networks from data (tutorial), in Proceedings of NIPS, 2000

    Google Scholar 

  35. J.B. Tenenbaum, T.L. Griffiths, Generalization, similarity, and bayesian inference. Behavior. Brain Sci. 24, 629–641 (2001)

    Google Scholar 

  36. P.J. Doshi, L.G. Greenwald, J.R. Clarke, Using bayesian networks for cleansing trauma data, in Proceedings of FLAIRS Conference, pp. 72–76, 2003

    Google Scholar 

  37. T. Dietterich, G. Bakiri, Solving multiclass learning problems via error-correcting output codes. Artif. Intell. Res. 2, 263–286 (1995)

    MATH  Google Scholar 

  38. NIST. Common Evaluation Measures. Appendix in Special Publication 500-250 (TREC 2001), 2001

    Google Scholar 

  39. J. Platt, Probabilistic outputs for svms and comparisons to regularized likelihood methods, in Advances in Large Margin Classifiers (MIT press, Cambridge, 1999)

    Google Scholar 

  40. Y. Wu, B.L. Tseng, J.R. Smith, Ontology-based multi-classification learning for video concept detection, in Proceedings of the IEEE International Conference on Multimedia and Expo, pp. 1003–1006, 2004

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Edward Y. Chang .

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg and Tsinghua University Press

About this chapter

Cite this chapter

Chang, E.Y. (2011). Fusing Content and Context with Causality. In: Foundations of Large-Scale Multimedia Information Management and Retrieval. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20429-6_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-20429-6_7

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-20428-9

  • Online ISBN: 978-3-642-20429-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics