Abstract
Most Web pages contain location information, which can be used to improve the effectiveness of search engines. In this paper, we concentrate on the focused locations, which refer to the most appropriate locations associated with Web pages. Current algorithms suffer from the ambiguities among locations, as many different locations share the same name (known as GEO/GEO ambiguity), and some locations have the same name with non-geographical entities such as person names (known as GEO/NON-GEO ambiguity). In this paper, we first propose a new algorithm named GeoRank, which employs a similar idea with PageRank to resolve the GEO/GEO ambiguity. We also introduce some heuristic rules to eliminate the GEO/NON-GEO ambiguity. After that, an algorithm with dynamic parameters to determine the focused locations is presented. We conduct experiments on two real datasets to evaluate the performance of our approach. The experimental results show that our algorithm outperforms the state-of-the-art methods in both disambiguation and focused locations determination.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Cognitive computation group, http://cogcomp.cs.illinois.edu/page/software (accessed in April 2011)
Gate, http://gate.ac.uk/ (accessed in April 2011)
Andogah, G., Bouma, G., Nerbonne, J., Koster, E.: Place name Ambiguity Resolution. In: Proc. of LREC, Marrakech Morocco, pp. 4–10 (2008)
Geonames, http://www.geonames.org (accessed in April 2011)
Washington, http://en.wikipedia.org/wiki/washington (accessed in April 2011)
United Nations department of economic and social affairs, http://unstats.un.org/unsd (accessed in April 2011)
Usgs geographic names information system (gnis), http://geonames.usgs.gov (accessed in April 2011)
World Gazetteer, http://www.world-gazetteer.com (accessed in April 2011)
Lingpipe, http://alias-i.com/lingpipe/ (accessed in April 2011)
Amitay, E., Har’El, N., Sivan, R., Soffer, A.: Web-a-where: geotagging Web content. In: Proc. of SIGIR, Sheffield, United Kingdom, pp. 273–280 (2004)
Anastacio, I., Martins, B., Calado, P.: A comparison of different approaches for assigning geographic scopes to documents. In: Proc. of the INForum 2009 (2009)
Chen, M., Lin, X., Zhang, Y., Wang, X., Yu, H.: Assigning geographical focus to documents. In: Proc. of Geoinformatics, Beijing, China, pp. 1–6 (2010)
Ding, J., Gravano, L., Shivakumar, N.: Computing geographical scopes of Web resources. In: Proc. of VLDB, Cairo, Egypt, pp. 545–556 (2000)
Gyle, A., Plaunt, C.: Gipsy: Automated geographic indexing of text documents. Journal of the American Society of Information Science 45(9), 645–655 (1994)
Leidner, J.L.: Toponym resolution in text: Annotation, evaluation and applications of spatial grounding of place names. PhD dissertation, University of Edinburgh (2007)
Leidner, J.L.: An evaluation dataset for the toponym resolution task. Computers Environment and Urban Systems 30(4), 400–417 (2006)
Markowetz, A., Chen, Y., Suel, T.: Design and implementation of a geographic search engine. In: Proc. of WebDB, Baltimore, Maryland, pp. 19–24 (2005)
Silva, M.J., Martins, B.: Adding Geographic Scopes to Web Resources. Computers Environment and Urban Systems 30(4), 378–399 (2006)
Martins, B., Silva, M.J.: A Graph-Ranking Algorithm for Geo-Referencing Documents. In: Proc. If ICDM, Houston, Texas, pp. 741–744 (2005)
Wang, C., Xie, X., Wang, L., Lu, Y., Ma, W.: Detecting Geographic Locations from Web Resources. In: Proc. of GIR, Bremen, Germany, pp. 17–249
Sanderson, M., Kohler, J.: Analyzing geographic queries. In: Proc. of GIR, Sheffield, UK (2004)
Sanderson, M.: Retrieving with good sense. Information Retrieval 2(1), 45–65 (2000)
Sobhana, N., Barua, A., Das, M., Mitra, P., Ghosh, S.: Co-occurrence Based Place Name Disambiguation and its Application to Retrieval of Geological Text. In: Meghanathan, N., Boumerdassi, S., Chaki, N., Nagamalai, D. (eds.) NeCoM 2010, Part III. CCIS, vol. 90, pp. 543–552. Springer, Heidelberg (2010)
Volz, R., Kleb, J., Mueller, W.: Towards ontology-based disambiguation of geographical identifiers. In: Proc. of WWW Workshop on Identity, Identifiers, Identifications (I3), Bandd, Alberta, Canada (2007)
Zubizarreta, A., de la Fuente, P., Cantera, J.M., Arias, M.: Extracting geographic context from the Web: georeferencing in mymose. In: Proc. of GIR, pp. 554–561 (2009)
Wang, X., Zhang, Y., Chen, M., Lin, X.: An Evidence-based Approach for Toponym Disambiguation. In: Proc. of Geoinformatics 2010, pp. 1–7 (2010)
Wang, L., Wang, C., Xie, X., Forman, J., Lu, Y., Ma, W., Li, Y.: Detecting Dominant Locations from Search Queries. In: Proc. of SIGIR, Salvador, Brazil, pp. 424–431 (2005)
Rauch, E., Bukatin, M., Baker, K.: A confidence-based framework for disambiguating geographic terms. In: Proc. of HLT-NAACL-GEOREF, pp. 50–54 (2003)
Bryan, K., Leise, T.: The $25,000,000,000 Eigenvector: The Linear Algebra Behind Google. Journal SIAM Review 40(3), 569–581 (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zhang, Q., Jin, P., Lin, S., Yue, L. (2012). Extracting Focused Locations for Web Pages. In: Wang, L., Jiang, J., Lu, J., Hong, L., Liu, B. (eds) Web-Age Information Management. WAIM 2011. Lecture Notes in Computer Science, vol 7142. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28635-3_7
Download citation
DOI: https://doi.org/10.1007/978-3-642-28635-3_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-28634-6
Online ISBN: 978-3-642-28635-3
eBook Packages: Computer ScienceComputer Science (R0)