Abstract
In this paper, we present WPPS, a new configurable Java-based framework for developing web page processing methods. The key innovations of WPPS are 1) a unified ontological model which describes the visual representation of web pages; 2) an API and abstractions which allow the application of both declarative and object-oriented mechanisms to develop new methods and approaches.
This work is funded by the Austrian Forschungsförderungsgesellschaft FFG under grant 829614 (TAMCROW).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Fayzrakhmanov, R.R.: Information Extraction from Web Pages Based on Their Visual Representation. In: Harth, A., Koch, N. (eds.) ICWE 2011. LNCS, vol. 7059, pp. 342–346. Springer, Heidelberg (2012)
Hiremath, P.S., Algur, S.P.: Extraction of flat and nested data records from web pages. In: SivaKumar, K., Selvi, A. (eds.) IJCSE, vol. 2, pp. 36–45. SIPS Tech. (2010)
Krüpl-Sypien, B., Fayzrakhmanov, R.R., Holzinger, W., Panzenböck, M., Baumgartner, R.: A versatile model for web page representation, information extraction and content re-packaging. In: Proc. of DocEng 2011, pp. 129–138. ACM (2011)
Zhai, Y., Liu, B.: Web data extraction based on partial tree alignment. In: Proc. of WWW 2005, pp. 76–85 (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Fayzrakhmanov, R.R. (2012). WPPS: A Framework for Web Page Processing. In: Wang, X.S., Cruz, I., Delis, A., Huang, G. (eds) Web Information Systems Engineering - WISE 2012. WISE 2012. Lecture Notes in Computer Science, vol 7651. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35063-4_70
Download citation
DOI: https://doi.org/10.1007/978-3-642-35063-4_70
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-35062-7
Online ISBN: 978-3-642-35063-4
eBook Packages: Computer ScienceComputer Science (R0)