Abstract
Geo-tagging is a fast-emerging trend in digital photography and community photo sharing. The presence of geographically relevant metadata with images and videos has opened up interesting research avenues within the multimedia and computer vision domains. In this paper, we survey geo-tagging related research within the context of multimedia and along three dimensions: (1) Modalities in which geographical information can be extracted, (2) Applications that can benefit from the use of geographical information, and (3) The interplay between modalities and applications. Our survey will introduce research problems and discuss significant approaches. We will discuss the nature of different modalities and lay out factors that are expected to govern the choices with respect to multimedia and vision applications. Finally, we discuss future research directions in this field.
Similar content being viewed by others
References
Agarwal S, Snavely N, Simon I, Seitz SM, Szeliski R (2009) Building Rome in a day. In Proceedings of ICCV
Ahlers D, Boll S (2008) Oh Web image, where art thou? In Proceedings of MMM
Ames M, Naaman M (2007) Why we tag: motivations for annotation in mobile and online media. In Proceedings of SIGCHI Conference on Human Factors in Computing Systems
Amitay E, Har’El N, Sivan R, Soffer A (2004) Web-a-where: geotagging web content. In Proceedings of ACM SIGIR Conference on Research and Development in Information Retrieval
Arslan S, Zimmermann R, Kim SH (2008) Viewable scene modeling for geospatial video search. In Proceedings of ACM Multimedia
Arslan S, Zhang L, Kim SH, He M, Zimmermann R (2009) GRVS: A georeferenced video search engine. In Proceedings of ACM Multimedia
Arslan S, Kim SH, He M, Zimmermann R (2010) Relevance ranking in georeferenced video search. Multimed Syst 16(2):105–125
Backstrom L, Sun E, Marlow C (2010) Find me if you can: improving geographical prediction with social and spatial proximity. In Proceedings of WWW
Bay H, Tuytelaars T, Van Gool L (2006) Surf: speeded up robust features. In Proceedings of ECCV
Benz U, Hofmann P, Willhauck G, Lingenfelder I, Heynen M (2004) Multiresolution object oriented fuzzy analysis of remote sensing data for GIS information. ISPRS J Photogramm Remote Sens 58:239–258
Cao L, Luo J, Kautz H, Huang T (2008) Annotating collections of geotagged photos using hierarchical event and scene models. In Proceedings of IEEE CVPR
Cao L, Luo J, Huang TS (2008) Annotating photo collections by label propagation according to multiple proximity cues. In Proceedings of ACM Multimedia
Cao L, Yu J, Luo J, Huang TS (2009) Enhancing semantic and geographic annotation of Web images via logistic canonical correlation regression. In Proceedings of ACM Multimedia
Cao L, Luo J, Gallagher A, Jin X, Han J, Huang TS (2010) A worldwide tourism recommendation system based on geotagged web photos. In Proceedings of ICASSP
Cham TJ, Ciptadi A, Tan WC, Pham MT, Chia LT (2010) Estimating camera pose from a single urban ground-view omnidirectional image and a 2D building outline map. In Proceedings of IEEE CVPR 2010
Chen Y, Chen XY, Rao FY, Yu XL, Li Y, Liu D (2004) LORE: an infrastructure to support location-aware services. IBM J Res Develop 48(5/6):601–616
Chen L, Ozsu MT, Oria V (2005) Robust and fast similarity search for moving object trajectories. In Proceedings of ACM SIGMOD
Chen W-C, Battestini A, Gelfand N, Setlur V (2009) Visual summaries of popular landmarks from community photo collections. In Proceedings of ACM Multimedia
Crandall D, Backstrom L, Huttenlocher D, Kleinberg J (2009) Mapping the world’s photos. In Proceedings of WWW
Cristani M, Perina A, Castellani U, Murino V (2008) Geo-located image analysis using latent representations. In Proceedings of IEEE CVPR
Davis M, Smith M, Canny D, Good N, King S, Jankiraman R (2005) Toward context-aware face recognition. In Proceedings of ACM Multimedia
De Silva GC, Aizawa K (2009) Retrieving multimedia travel stories using location data and spatial queries. In Proceedings of ACM Multimedia
Divvala S, Hoiem D, Hays J, Efros A, Hebert M (2009) An empirical study of context in object detection. In Proceedings of IEEE CVPR
Epshtein B, Ofek E, Wexler Y, Zhang P (2007) Hierarchical photo organization using geo-relevance. In Proceedings of 15th ACM Intl. Symposium on Advances in Geographic Information Systems
Gallagher A (2009) A framework for using context to understanding images of people. Ph. D. Thesis
Gallagher A, Joshi D, Yu J, Luo J (2009) Geo-location inference from image content and user tags. In Proceedings of the IEEE Workshop on Internet Vision (with CVPR)
Goodchild MF (2007) Citizens as sensors: the world of volunteered geography. GeoJournal 69(4):211–221
Hao Q, Cai R, Yang J -M, Xiao R, Liu L, Wang S, Zhang L (2009) Travelscope: standing on the shoulders of dedicated travelers. In Proceedings of ACM Multimedia
Hao Q, Cai R, Wang C, Xiao R, Yang J -M, Pang Y, Zhang L (2010) Equip tourists with knowledge mined from travelogues. In Proceedings of WWW
Hartley R, Zisserman A (2004) Multiple view geometry in computer vision. Cambridge University Press
Hays J, Efros A (2008) IM2GPS: estimating geographic information from a single image. In Proceedings of IEEE CVPR
Hinz S, Baumgartner A (2003) Automatic extraction of urban road networks from multi-view aerial imagery. ISPRS J Photogramm Remote Sens 83–98
Hinze A, Voisard A (2003) Location and time-based information delivery in tourism, Advances in spatial and temporal databases. Lect Notes Comput Sci 2750:489–507
Hsieh C-C, Cheng W-H, Chang C-H, Chuang Y-Y, Wu J-L (2008) Photo navigator. In Proceedings of ACM Multimedia
Jacobs N, Satkin S, Roman N, Speyer R, Pless R (2007) Geolocating static cameras. In Proceedings of IEEE ICCV
Jaffe A, Tassa T, Davis M (2006) Generating summaries and visualization for large collections of geo-referenced photographs. In Proceedings of ACM Multimedia Information Retrieval (MIR) Workshop
Ji R, Xie X, Yao H, Ma W-Y (2009) Mining city landmarks from blogs by graph modeling. In Proceedings of ACM Multimedia
Jin X, Davis DH (2005) An integrated system for automatic road mapping from high-resolution multi-spectral satellite imagery by information fusion. Information Fusion 6(4):257–273
Jin X, Gallagher A, Cao L, Luo J, Han J (2010) The wisdom of social multimedia: using Flickr for prediction and forecast. In Proceedings of ACM Multimedia
Joshi D, Luo J (2008) Inferring generic activities and events from image content and bags of geo-tags. In Proceedings of ACM CIVR
Joshi D, Gallagher A, Yu J, Luo J (2010) Exploring user image tags for geo-location inference. In Proceedings of IEEE ICASSP
Kalogerakis E, Vesselova O, Hays J, Efros A, Hertzmann A (2009) Image sequence geolocation with human travel priors. In Proceedings of IEEE ICCV
Kaminsky R, Snavely N, Seitz SM, Szeliski R (2009) Alignment of 3D point clouds to overhead images. In Proceedings of the IEEE Workshop on Internet Vision (with CVPR)
Kennedy L, Naaman M (2008) Generating diverse and representative image search results for landmarks. In Proceedings of WWW
Kennedy L, Naaman M, Ahern S, Nair R, Rattenbury T (2007) How Flickr helps us make sense of the world: context and content in community-contributed media collections. In Proceedings of ACM Multimedia
Kim SH, Arslan S, Yu B, Zimmermann R (2010) Vector model in support of versatile georeferenced video search. In Proceedings of ACM Multimedia Systems Conference
Kleban J, Moxley E, Xu J, Manjunath BS (2009) Global annotation on georeferenced photographs. In Proceedings of ACM CIVR
Kosecka J, Zhang W (2002) Video compass. In Proceedings of European Conference on Computer Vision (ECCV)
Leung D, Newsame S (2010) Proximate sensing: inferring what-is-where from georeferenced photo collections. In Proceedings of IEEE CVPR
Li X, Wu C, Zach C, Lazebnik S, Frahm J-M (2008) Modeling and recognition of landmark image collections using iconic scene graphs. In Proceedings of ECCV
Li Y, Crandall D, Huttenlocher D (2009) Landmark classification in large-scale image collections. In Proceedings of ICCV
Liao L, Fox D, and Kautz, H (2007) Extracting places and activities from GPS traces using hierarchical conditional random fields. Int J Rob Res
Liu L, Wolfson O, Yin H (2006) Extracting semantic location from outdoor positioning systems. In Proceedings of the IEEE International Conference on Mobile Data Management
Lothe P, Bourgeois S, Royer E, Dhome M, Naudet-Collette S (2010) Real-time vehicle global localization with a single camera in dense urban areas: exploitation of coarse 3D city models. In Proceedings of CVPR
Lowe D (2004) Distinctive image features from scale-invariant keypoints. J Comput Vis
Luo J, Boutell M, Brown C (2006) Pictures are not taken in a vacuum: An overview of exploiting context for semantic scene content understanding. IEEE Signal Process Mag 23(2):101–114
Luo J, Yu J, Joshi D, Hao W (2008) Event recognition: viewing the world with a third eye. In Proceedings of ACM Multimedia
Luo Z, Li H, Tang J, Hong R, Chua T-S (2009) ViewFocus: explore places of interests on Google maps using photos with view direction filtering. In Proceedings of ACM Multimedia
Luo Z, Li H, Tang J, Hong R, Chua T –S (2010) Estimating poses of world’s photos with geographic metadata. In Proceedings of MMM
Matas J, Chum O, Urban M, Pajdla T (2002) Robust wide-baseline stereo from maximally stable extremal regions. Image Vis Comput 22(10)
Moxley E, Kleban J, Manjunath BS (2008) SpiritTagger: a geo-aware tag suggestion tool mined from Flickr. In Proceedings of ACM Multimedia Information Retrieval (MIR)
Naaman M, Song Y -J, Paepcke A, Garcia-Molina H (2004) Automatic organization for digital photographs with geographic coordinates. In Proceedings of ACM/IEEE-CS Joint Conference on Digital Libraries
Naaman M, Yeh RB, Garcia-Molina H, Paepcke A (2005) Leveraging context to resolve identity in photo albums. In Proceedings of ACM/IEEE-CS Joint Conference on Digital libraries
O’Hare N (2007) Semi-automatic person-annotation in context-aware personal photo-collections. Ph. D. thesis
O’Hare N, Smeaton A (2009) Context-aware person identification in personal photo collections. IEEE Transactions on Multimedia
Oliva A, Torralba A (2001) Modeling the shape of the scene: a holistic representation of the spatial envelope. Int J Comput Vision 42:145–175
Park M, Luo J, Collins R, Liu Y (2010) Beyond GPS: determining viewing direction of a geotagged image. n Proceedings of ACM Multimedia
Paucher R, Turk M (2010) Location-based augmented reality on mobile phones. In Proceedings of IEEE CVPR
Pavlidis T (2009) Why meaningful automatic tagging of images is very hard. In Proceedings of IEEE ICME
Pei J, Han J, Mortazavi-Asl B, Wang J, Pinto H, Chen Q, Dayal U, Hsu M (2004) Mining sequential patterns by pattern-growth: the prefixspan approach. IEEE Trans Knowl Data Eng 16(11):1424–1440
Pelekis N, Kopanakis I, Kotsifakos EE, Frentzos E, Theodoridis Y (2009) Clustering trajectories of moving objects in an uncertain world. In Proceedings of ICDM
Pigeau A, Gelgon M (2005) Building and tracking hierarchical geographical & temporal partitions for image collection management on mobile devices. In Proceedings of ACM Multimedia
Popescu A, Grefenstette G (2009) Deducing trip related information from Flickr. In Proceedings of WWW
Popescu A, Moëllic P-A (2009) MonuAnno: Automatic annotation of georeferenced landmarks images. In Proceedings of ACM CIVR.
Popescu A, Grefenstette G, Moëllic P-A (2009) Mining tourist information from user-supplied collections. In Proceedings of CIKM
Quack T, Leibe B, Van Gool L (2008) World-scale mining of objects and events from community photo collections. In Proceedings of CIVR
Rattenbury T, Naaman M (2009) Methods for extracting place semantics from Flickr tags. ACM Trans Web 3(1):1–30
Rattenbury T, Good N, Naaman M (2007) Towards automatic extraction of event and place semantics from Flickr tags. In Proceedings of SIGIR
Sakaki T, Okazaki M, Matsuo Y (2010) Earthquake shakes Twitter users: real-time event detection by social sensors. In Proceedings of International Conference on WWW
Schaffalitzky F, Zisserman A (2002) Multi-view matching for unordered image sets. In Proceedings of ECCV
Schiller JH, Voisard A (2004) Location-based services. Morgan Kaufmann
Schindler G, Brown M, Szeliski R (2007) City-scale location recognition. In Proceedings of IEEE CVPR
Schindler G, Krishnamurthy P, Lublinerman R, Liu Y, Dellaert F (2008) Detecting and matching repeated patterns for automatic geo-tagging in urban environments. In Proceedings of IEEE CVPR
Serdyukov P, Murdock V, van Zwol R (2009) Placing Flickr photos on a map. In Proceedings of SIGIR
Simon I, Seitz SM (2008) Scene segmentation using the wisdom of crowds. In Proceedings of ECCV
Simon I, Snavely N, Seitz SM (2007) Scene summarization for online image collections. In Proceedings of IEEE ICCV
Singh V, Gao M, Jain R (2010) Social Pixels: genesis and evaluation. In Proceedings of ACM Multimedia
Snavely N, Seitz SM, Szeliski R (2006) Photo tourism: exploring photo collections in 3D. ACM Trans Graph 25(3):835–846
Snavely N, Garg R, Seitz SM, Szeliski R (2008) Finding paths through the world’s photos. ACM Trans Graph 27(3)
Snavely N, Seitz SM, Szeliski R (2008) Modeling the world from internet photo collections. Int J Comput Vision 80(2):189–210
Sunkavalli K, Romeiro F, Matusik W, Zickler T, Pfister H (2008) What do color changes reveal about an outdoor scene. In Proceedings of CVPR
Szeliski R (2005) Where am I? In Proceedings of IEEE ICCV Computer Vision Contest. http://research.microsoft.com/en-us/um/people/szeliski/VisionContest05/old_ideas.htm
Torniai C, Battle S, Cayzer S (2006). Sharing, discovering and browsing geotagged pictures on the web. Springer
Toyama K, Logan R, Roseway A (2003) Geographic location tags on digital images. In Proceedings of ACM Multimedia
Trinder JC, Wang Y (1998) Automatic road extraction from aerial images. Digital Signal Process 8(4):215–224
Tsai C -M, Qamra A, Chang E (2005) Extent: inferring image metadata from context and content. In Proceedings of IEEE ICME
Tsikrika T, Diou C, de Vries A, Delopoulos A (2009) Image annotation using clickthrough data. In Proceedings of ACM CIVR
Tuytelaars T, Van Gool L (2004) Matching widely separated views based on affine invariant regions. Int J Comput Vis
Ueda T, Amagasa T, Yoshikawa M, Uemura S (2002) A system for retrieval and digest creation of video data based on geographic objects. In Proceedings of International Conference on Database and Expert Systems Applications
Wei X -Y, Jiang Y-G, Ngo C-W (2009) Exploring inter-concept relationship with context space for semantic video indexing. In Proceedings of ACM CIVR
Wolf L, Bileschi S (2006) A critical view of context. Int J Comput Vision 68(1):43–52
Yanai K, Kawakubo H, Qiu B (2009) A visual analysis of the relationship between word concepts and geographical locations, In Proceedings of CIVR
Yanai K, Yaegashi K, Qiu B (2009) Detecting cultural differences using consumer-generated geotagged photos. In Proceedings of International Workshop on Location and the Web
Yu J, Luo J (2008) Leveraging probabilistic season and location context models for scene understanding. In Proceedings of ACM CIVR
Yuan J, Luo J, Kautz H, Wu Y (2008) Mining GPS traces and visual words for event classification. In Proceedings of Multimedia Information Retrieval (MIR)
Zhang W, Kosecka J (2006) Image based localization in urban environments. In Proceedings of 3DPVT
Zheng Y, Wang L, Zhang R, Xie X, Ma W-Y (2009) GeoLife: managing and understanding your past life over maps. In Proceedings of MDM
Zheng Y, Zhang L, Xie X, Ma W-Y (2009) Mining interesting locations and travel sequences from GPS trajectories. In Proceedings of WWW
Zheng Y, Zhao M, Song Y, Hartwig A, Buddemeier U, Bissacco A, Brucher F, Chua T-S, Neven H (2009) Tour the world: building a web-scale landmark recognition engine. In Proceedings of CVPR
Zheng VW, Zheng Y, Xie X, Yang Q (2010) Collaborative location and activity recommendations with GPS history data. In Proceedings of WWW
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Luo, J., Joshi, D., Yu, J. et al. Geotagging in multimedia and computer vision—a survey. Multimed Tools Appl 51, 187–211 (2011). https://doi.org/10.1007/s11042-010-0623-y
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-010-0623-y