Skip to main content

ETL Workflows: From Formal Specification to Optimization

  • Conference paper
Advances in Databases and Information Systems (ADBIS 2007)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4690))

Abstract

In this paper, we present our work on a framework towards the modeling and optimization of Extraction-Transformation-Loading (ETL) workflows. The goal of this research was to facilitate, manage, and optimize the design and implementation of the ETL workflows both during the initial design and deployment stage, as well as, during the continuous evolution of a data warehouse. In particular, we present our results which include: (a) the provision of a novel conceptual model for the tracing of inter-attribute relationships and the respective ETL transformations in the early stages of a data warehouse project, along with an attempt to use ontology-based mechanisms to semi-automatically capture the semantics and the relationships among the various sources; (b) the provision of a novel logical model for the representation of ETL workflows with two main characteristics: genericity and customization; (c) the semi-automatic transition from the conceptual to the logical model for ETL workflows; and (d) the tuning of an ETL workflow for the optimization of the execution order of its operations. Finally, we discuss some issues on future work in the area that we consider important and a step towards the incorporation of the above research results to other areas as well.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Polyzotis, N., Skiadopoulos, S., Vassiliadis, P., Simitsis, A., Frantzell, N.-E.: Supporting streaming updates in an active data warehouse. In: ICDE 2007. Proceedings of the 23rd IEEE International Conference on Data Engineering, IEEE Computer Society Press, Los Alamitos (2007)

    Google Scholar 

  2. Rahm, E., Bernstein, P.A.: A survey of approaches to automatic schema matching. VLDB Journal 10(4), 334–350 (2001)

    Article  MATH  Google Scholar 

  3. Simitsis, A.: Mapping conceptual to logical models for ETL processes. In: DOLAP 2005. Proceedings of the ACM 8th International Workshop on Data Warehousing and OLAP, pp. 67–76. ACM Press, New York (2005)

    Chapter  Google Scholar 

  4. Simitsis, A., Vassiliadis, P.: A method for the mapping of conceptual designs to logical blueprints for ETL processes. Decision Support Systems (DSS) (to appear)

    Google Scholar 

  5. Simitsis, A., Vassiliadis, P., Sellis, T.K.: Optimizing ETL processes in data warehouses. In: Proceedings of the 21st International Conference on Data Engineering (ICDE 2005), pp. 564–575 (2005)

    Google Scholar 

  6. Simitsis, A., Vassiliadis, P., Sellis, T.K.: State-space optimization of ETL workflows. IEEE Transactions on Knowledge and Data Engineering 17(10), 1404–1419 (2005)

    Article  Google Scholar 

  7. Simitsis, A., Vassiliadis, P., Skiadopoulos, S., Sellis, T.K.: Data Warehouses and OLAP: Concepts, Architectures and Solutions. In: Wrembel, R., Koncilia, C. (eds.) Data Warehouse Refreshment, IRM Press (2006)

    Google Scholar 

  8. Simitsis, A., Vassiliadis, P., Terrovitis, M., Skiadopoulos, S.: Graph-based modeling of ETL activities with multi-level transformations and updates. In: Tjoa, A.M., Trujillo, J. (eds.) DaWaK 2005. LNCS, vol. 3589, pp. 43–52. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  9. Skoutas, D., Simitsis, A.: Designing ETL processes using semantic web technologies. In: DOLAP 2006. Proceedings of the ACM 9th International Workshop on Data Warehousing and OLAP, pp. 67–74. ACM Press, New York (2006)

    Chapter  Google Scholar 

  10. Skoutas, D., Simitsis, A.: Flexible and customizable NL representation of requirements for ETL processes. In: Proceedings of the 12th Int’l Conf. on Applications of Natural Language to Information Systems (NLDB 2007), pp. 433–439 (2007)

    Google Scholar 

  11. Skoutas, D., Simitsis, A.: Ontology-based conceptual design of ETL processes for both structured and semi-structured data. Int’l Journal of Semantic Web and Information Systems (to appear)

    Google Scholar 

  12. Vassiliadis, P., Simitsis, A., Georgantas, P., Terrovitis, M., Skiadopoulos, S.: A generic and customizable framework for the design of ETL scenarios. Infornation Systems 30(7), 492–525 (2005)

    Article  Google Scholar 

  13. Vassiliadis, P., Simitsis, A., Skiadopoulos, S.: Conceptual modeling for ETL processes. In: DOLAP 2002. Proceedings of the ACM 5th International Workshop on Data Warehousing and OLAP, pp. 14–21. ACM Press, New York (2002)

    Chapter  Google Scholar 

  14. Vassiliadis, P., Simitsis, A., Terrovitis, M., Skiadopoulos, S.: Blueprints and measures for ETL workflows. In: Delcambre, L.M.L., Kop, C., Mayr, H.C., Mylopoulos, J., Pastor, Ó. (eds.) ER 2005. LNCS, vol. 3716, pp. 385–400. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  15. Zaniolo, C.: LDL++ Tutorial. UCLA (1998), available at: http://pike.cs.ucla.edu/ldl/

Download references

Author information

Authors and Affiliations

Authors

Editor information

Yannis Ioannidis Boris Novikov Boris Rachev

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Sellis, T.K., Simitsis, A. (2007). ETL Workflows: From Formal Specification to Optimization. In: Ioannidis, Y., Novikov, B., Rachev, B. (eds) Advances in Databases and Information Systems. ADBIS 2007. Lecture Notes in Computer Science, vol 4690. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-75185-4_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-75185-4_1

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-75184-7

  • Online ISBN: 978-3-540-75185-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics