Abstract
In this paper, we present our work on a framework towards the modeling and optimization of Extraction-Transformation-Loading (ETL) workflows. The goal of this research was to facilitate, manage, and optimize the design and implementation of the ETL workflows both during the initial design and deployment stage, as well as, during the continuous evolution of a data warehouse. In particular, we present our results which include: (a) the provision of a novel conceptual model for the tracing of inter-attribute relationships and the respective ETL transformations in the early stages of a data warehouse project, along with an attempt to use ontology-based mechanisms to semi-automatically capture the semantics and the relationships among the various sources; (b) the provision of a novel logical model for the representation of ETL workflows with two main characteristics: genericity and customization; (c) the semi-automatic transition from the conceptual to the logical model for ETL workflows; and (d) the tuning of an ETL workflow for the optimization of the execution order of its operations. Finally, we discuss some issues on future work in the area that we consider important and a step towards the incorporation of the above research results to other areas as well.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Polyzotis, N., Skiadopoulos, S., Vassiliadis, P., Simitsis, A., Frantzell, N.-E.: Supporting streaming updates in an active data warehouse. In: ICDE 2007. Proceedings of the 23rd IEEE International Conference on Data Engineering, IEEE Computer Society Press, Los Alamitos (2007)
Rahm, E., Bernstein, P.A.: A survey of approaches to automatic schema matching. VLDB Journal 10(4), 334–350 (2001)
Simitsis, A.: Mapping conceptual to logical models for ETL processes. In: DOLAP 2005. Proceedings of the ACM 8th International Workshop on Data Warehousing and OLAP, pp. 67–76. ACM Press, New York (2005)
Simitsis, A., Vassiliadis, P.: A method for the mapping of conceptual designs to logical blueprints for ETL processes. Decision Support Systems (DSS) (to appear)
Simitsis, A., Vassiliadis, P., Sellis, T.K.: Optimizing ETL processes in data warehouses. In: Proceedings of the 21st International Conference on Data Engineering (ICDE 2005), pp. 564–575 (2005)
Simitsis, A., Vassiliadis, P., Sellis, T.K.: State-space optimization of ETL workflows. IEEE Transactions on Knowledge and Data Engineering 17(10), 1404–1419 (2005)
Simitsis, A., Vassiliadis, P., Skiadopoulos, S., Sellis, T.K.: Data Warehouses and OLAP: Concepts, Architectures and Solutions. In: Wrembel, R., Koncilia, C. (eds.) Data Warehouse Refreshment, IRM Press (2006)
Simitsis, A., Vassiliadis, P., Terrovitis, M., Skiadopoulos, S.: Graph-based modeling of ETL activities with multi-level transformations and updates. In: Tjoa, A.M., Trujillo, J. (eds.) DaWaK 2005. LNCS, vol. 3589, pp. 43–52. Springer, Heidelberg (2005)
Skoutas, D., Simitsis, A.: Designing ETL processes using semantic web technologies. In: DOLAP 2006. Proceedings of the ACM 9th International Workshop on Data Warehousing and OLAP, pp. 67–74. ACM Press, New York (2006)
Skoutas, D., Simitsis, A.: Flexible and customizable NL representation of requirements for ETL processes. In: Proceedings of the 12th Int’l Conf. on Applications of Natural Language to Information Systems (NLDB 2007), pp. 433–439 (2007)
Skoutas, D., Simitsis, A.: Ontology-based conceptual design of ETL processes for both structured and semi-structured data. Int’l Journal of Semantic Web and Information Systems (to appear)
Vassiliadis, P., Simitsis, A., Georgantas, P., Terrovitis, M., Skiadopoulos, S.: A generic and customizable framework for the design of ETL scenarios. Infornation Systems 30(7), 492–525 (2005)
Vassiliadis, P., Simitsis, A., Skiadopoulos, S.: Conceptual modeling for ETL processes. In: DOLAP 2002. Proceedings of the ACM 5th International Workshop on Data Warehousing and OLAP, pp. 14–21. ACM Press, New York (2002)
Vassiliadis, P., Simitsis, A., Terrovitis, M., Skiadopoulos, S.: Blueprints and measures for ETL workflows. In: Delcambre, L.M.L., Kop, C., Mayr, H.C., Mylopoulos, J., Pastor, Ó. (eds.) ER 2005. LNCS, vol. 3716, pp. 385–400. Springer, Heidelberg (2005)
Zaniolo, C.: LDL++ Tutorial. UCLA (1998), available at: http://pike.cs.ucla.edu/ldl/
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Sellis, T.K., Simitsis, A. (2007). ETL Workflows: From Formal Specification to Optimization. In: Ioannidis, Y., Novikov, B., Rachev, B. (eds) Advances in Databases and Information Systems. ADBIS 2007. Lecture Notes in Computer Science, vol 4690. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-75185-4_1
Download citation
DOI: https://doi.org/10.1007/978-3-540-75185-4_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-75184-7
Online ISBN: 978-3-540-75185-4
eBook Packages: Computer ScienceComputer Science (R0)