Abstract
On-line retailers as well as e-shoppers are very interested in gathering product records from the Web in order to compare products and prices. The consumers compare products and prices to find the best price for a specific product or they want to identify alternatives for a product whereas the on-line retailers need to compare their offers with those of their competitors for being able to remain competitive. As there is a huge number and vast array of product offers in the Web the product data needs to be collected through an automated approach. The contribution of this papers is a novel approach for automatically identify and extract product records from arbitrary e-shop websites. The approach extends an existing technique which is called Tag Path Clustering for clustering similar HTML tag paths. The clustering mechanism is combined with a novel filtering mechanism for identifying the product records to be extracted within the websites.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
- 9.
- 10.
- 11.
- 12.
- 13.
- 14.
- 15.
- 16.
- 17.
- 18.
References
Nagelvoort, B., et al.: European B2C E-commerce report. Onlien (2014). http://www.adigital.org/sites/default/files/studies/european-b2c-ecommerce-report-2014.pdf
Simon, H., Fassnacht, M.: Preismanagement: Strategie - Analyse - Entscheidung - Umsetzung. Gabler Verlag, Wiesbaden (2008)
McGovern, C., Levesanos, A.: Optimizing pricing and promotions in a digital world: from product-led to customer-centric strategies (2014). http://www.accenture.com/us-en/Pages/insight-optimizing-pricing-promotions-digital-world-summary.aspx
Grigalis, T.: Towards web-scale structured web data extraction. In: Proceedings of the Sixth ACM International Conference on Web Search and Data Mining, WSDM 2013, pp. 753–758 (2013)
Grigalis, T., Cenys, A.: Unsupervised structured data extraction from template-generated web pages. J. Univ. Comput. Sci. 20, 169–192 (2014)
Liu, B., Grossman, R., Zhai, Y.: Mining data records in web pages. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2003, pp. 601–606 (2003)
Zhao, H., et al.: Fully automatic wrapper generation for search engines. In: Proceedings of the 14th International Conference on World Wide Web, WWW 2005, pp. 66–75 (2005)
Walther, M., er al.: Locating and extracting product specifications from producer websites. In: Proceedings of the 12th International Conference on Enterprise Information Systems, ICEIS 2010, pp. 13–22 (2010)
Liu, B.: Web Data Mining. Exploring Hyperlinks, Contents, and Usage Data, pp. 1–14. Springer, Heidelberg (2006)
Andreson, N., Hong, J.: Visually extracting data records from the deep web. In: Proceedings of the 22nd International World Wide Web Conference, WWW 2013, pp. 1233–1238 (2013)
Real, R., Vargas, J.M.: The probabilistic basis of jaccard’s index of similarity. Syst. Biol. 3, 380–385 (1996)
Horch, A., Kett, H., Weisbecker, A.: A lightweight approach for extracting product records from the web. In: Proceedings of the 11th International Conference on Web Information Systems and Technologies, WEBIST 2015, pp. 420–430 (2015)
Peters, J.F.: Topology of Digital Images. Visual Pattern Discovery in Proximity Spaces. ISRL, vol. 63, pp. 1–76. Springer, Heidelberg (2014)
PostNord: E-Commerce in Europpe 2014 (2014). http://www.postnord.com/globalassets/global/english/document/publications/2014/e-commerce-in-europe-2014.pdf
Van Rijsbergen, C.J.: Information Retrieval. Butterworth-Heinemann, New York (1979)
Acknowledgements
The work published in this article was partially funded by the SME E-COMPASS project of the European Union’s Seventh Framework Programme for research, technological development and demonstration under the grant agreement no. 315637.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Horch, A., Kett, H., Weisbecker, A. (2016). Extracting Product Offers from e-Shop Websites. In: Monfort, V., Krempels, KH., Majchrzak, T.A., Turk, Ž. (eds) Web Information Systems and Technologies. WEBIST 2015. Lecture Notes in Business Information Processing, vol 246. Springer, Cham. https://doi.org/10.1007/978-3-319-30996-5_12
Download citation
DOI: https://doi.org/10.1007/978-3-319-30996-5_12
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-30995-8
Online ISBN: 978-3-319-30996-5
eBook Packages: Computer ScienceComputer Science (R0)