Languages for Web Data Extraction

Kushmerick, Nicholas

doi:10.1007/978-1-4899-7993-3_1156-3

Nicholas Kushmerick³

72 Accesses

Synonyms

Information extraction; Screen scraping; Web mining; Web scraping; Web site wrappers

Definition

Web data extraction is the process of automatically converting Web resources into a specific structured format. For example, if a collection of HTML web pages describes details about various companies (name, headquarters, etc) then web data extraction would involve converting this native HTML format into computer-processable data structures, such as entries in relational database tables. The purpose of web data extraction is to make web data available for subsequent manipulation or integration steps. In the previous example, the goal may be summarizing the results as some form of analytical report.

There are several approaches to Web data extraction. The most common approach is to specify the conversion process using a special-purpose programming Language for Web Data Extraction. Web data extraction then becomes a matter of executing a well-defined computer program.

Web data...

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Author information

Authors and Affiliations

VMWare, Seattle, WA, USA
Nicholas Kushmerick

Authors

Nicholas Kushmerick
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nicholas Kushmerick .

Editor information

Editors and Affiliations

Georgia Institute of Technology College of Computing, Atlanta, Georgia, USA
Ling Liu
University of Waterloo School of Computer Science, Waterloo, Ontario, Canada
M. Tamer Özsu

Section Editor information

Computing Laboratory, Oxford University, Wolfson Building, Parks Road, OX1 3QD, Oxford, UK
Georg Gottlob

Rights and permissions

Reprints and permissions

Copyright information

About this entry

Cite this entry

Kushmerick, N. (2017). Languages for Web Data Extraction. In: Liu, L., Özsu, M. (eds) Encyclopedia of Database Systems. Springer, New York, NY. https://doi.org/10.1007/978-1-4899-7993-3_1156-3

Download citation

DOI: https://doi.org/10.1007/978-1-4899-7993-3_1156-3
Received: 26 April 2016
Accepted: 14 June 2016
Published: 16 February 2017
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4899-7993-3
Online ISBN: 978-1-4899-7993-3
eBook Packages: Springer Reference Computer SciencesReference Module Computer Science and Engineering

Publish with us

Policies and ethics

Chapter history

Latest
Languages for Web Data Extraction

Published:

16 February 2017

DOI: https://doi.org/10.1007/978-1-4899-7993-3_1156-3
Original
Languages for Web Data Extraction

Published:

29 November 2016

DOI: https://doi.org/10.1007/978-1-4899-7993-3_1156-2

Languages for Web Data Extraction

Synonyms

Definition

Access this chapter

Recommended Reading

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Section Editor information

Rights and permissions

Copyright information

About this entry

Cite this entry

Download citation

Publish with us

Chapter history

Latest

Original

Navigation

Languages for Web Data Extraction

Synonyms

Definition

Access this chapter

Recommended Reading

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Section Editor information

Rights and permissions

Copyright information

About this entry

Cite this entry

Download citation

Publish with us

Chapter history

Latest

Original

Search

Navigation