Skip to main content

Extracting Product Offers from e-Shop Websites

  • Conference paper
Web Information Systems and Technologies (WEBIST 2015)

Part of the book series: Lecture Notes in Business Information Processing ((LNBIP,volume 246))

Abstract

On-line retailers as well as e-shoppers are very interested in gathering product records from the Web in order to compare products and prices. The consumers compare products and prices to find the best price for a specific product or they want to identify alternatives for a product whereas the on-line retailers need to compare their offers with those of their competitors for being able to remain competitive. As there is a huge number and vast array of product offers in the Web the product data needs to be collected through an automated approach. The contribution of this papers is a novel approach for automatically identify and extract product records from arbitrary e-shop websites. The approach extends an existing technique which is called Tag Path Clustering for clustering similar HTML tag paths. The clustering mechanism is combined with a novel filtering mechanism for identifying the product records to be extracted within the websites.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 34.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 44.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://open.dapper.net/.

  2. 2.

    https://www.kimonolabs.com/.

  3. 3.

    https://import.io/.

  4. 4.

    http://www.diffbot.com/products/crawlbot/.

  5. 5.

    http://www.webkit.org/.

  6. 6.

    http://mathworld.wolfram.com/PrimeSpiral.html.

  7. 7.

    http://www.bestarabic.com/mall/ar/.

  8. 8.

    http://docs.seleniumhq.org/projects/webdriver/.

  9. 9.

    https://www.python.org/.

  10. 10.

    http://selenium-python.readthedocs.org/en/latest/api.html.

  11. 11.

    http://www.crummy.com/software/BeautifulSoup/.

  12. 12.

    http://www.cs.uic.edu/~liub/WebDataExtraction/MDR-download.html.

  13. 13.

    http://clustvx.no-ip.org/.

  14. 14.

    https://www.python.org/.

  15. 15.

    http://selenium-python.readthedocs.org/en/latest/api.html.

  16. 16.

    http://www.crummy.com/software/BeautifulSoup/.

  17. 17.

    http://www.cs.uic.edu/~liub/WebDataExtraction/MDR-download.html.

  18. 18.

    http://clustvx.no-ip.org/.

References

  1. Nagelvoort, B., et al.: European B2C E-commerce report. Onlien (2014). http://www.adigital.org/sites/default/files/studies/european-b2c-ecommerce-report-2014.pdf

  2. Simon, H., Fassnacht, M.: Preismanagement: Strategie - Analyse - Entscheidung - Umsetzung. Gabler Verlag, Wiesbaden (2008)

    Google Scholar 

  3. McGovern, C., Levesanos, A.: Optimizing pricing and promotions in a digital world: from product-led to customer-centric strategies (2014). http://www.accenture.com/us-en/Pages/insight-optimizing-pricing-promotions-digital-world-summary.aspx

  4. Grigalis, T.: Towards web-scale structured web data extraction. In: Proceedings of the Sixth ACM International Conference on Web Search and Data Mining, WSDM 2013, pp. 753–758 (2013)

    Google Scholar 

  5. Grigalis, T., Cenys, A.: Unsupervised structured data extraction from template-generated web pages. J. Univ. Comput. Sci. 20, 169–192 (2014)

    Google Scholar 

  6. Liu, B., Grossman, R., Zhai, Y.: Mining data records in web pages. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2003, pp. 601–606 (2003)

    Google Scholar 

  7. Zhao, H., et al.: Fully automatic wrapper generation for search engines. In: Proceedings of the 14th International Conference on World Wide Web, WWW 2005, pp. 66–75 (2005)

    Google Scholar 

  8. Walther, M., er al.: Locating and extracting product specifications from producer websites. In: Proceedings of the 12th International Conference on Enterprise Information Systems, ICEIS 2010, pp. 13–22 (2010)

    Google Scholar 

  9. Liu, B.: Web Data Mining. Exploring Hyperlinks, Contents, and Usage Data, pp. 1–14. Springer, Heidelberg (2006)

    Google Scholar 

  10. Andreson, N., Hong, J.: Visually extracting data records from the deep web. In: Proceedings of the 22nd International World Wide Web Conference, WWW 2013, pp. 1233–1238 (2013)

    Google Scholar 

  11. Real, R., Vargas, J.M.: The probabilistic basis of jaccard’s index of similarity. Syst. Biol. 3, 380–385 (1996)

    Article  Google Scholar 

  12. Horch, A., Kett, H., Weisbecker, A.: A lightweight approach for extracting product records from the web. In: Proceedings of the 11th International Conference on Web Information Systems and Technologies, WEBIST 2015, pp. 420–430 (2015)

    Google Scholar 

  13. Peters, J.F.: Topology of Digital Images. Visual Pattern Discovery in Proximity Spaces. ISRL, vol. 63, pp. 1–76. Springer, Heidelberg (2014)

    Book  MATH  Google Scholar 

  14. PostNord: E-Commerce in Europpe 2014 (2014). http://www.postnord.com/globalassets/global/english/document/publications/2014/e-commerce-in-europe-2014.pdf

  15. Van Rijsbergen, C.J.: Information Retrieval. Butterworth-Heinemann, New York (1979)

    MATH  Google Scholar 

Download references

Acknowledgements

The work published in this article was partially funded by the SME E-COMPASS project of the European Union’s Seventh Framework Programme for research, technological development and demonstration under the grant agreement no. 315637.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Andrea Horch .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Horch, A., Kett, H., Weisbecker, A. (2016). Extracting Product Offers from e-Shop Websites. In: Monfort, V., Krempels, KH., Majchrzak, T.A., Turk, Ž. (eds) Web Information Systems and Technologies. WEBIST 2015. Lecture Notes in Business Information Processing, vol 246. Springer, Cham. https://doi.org/10.1007/978-3-319-30996-5_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-30996-5_12

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-30995-8

  • Online ISBN: 978-3-319-30996-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics