Skip to main content

Data Mapper: An Operator for Expressing One-to-Many Data Transformations

  • Conference paper
Data Warehousing and Knowledge Discovery (DaWaK 2005)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3589))

Included in the following conference series:

Abstract

Transforming data is a fundamental operation in application scenarios involving data integration, legacy data migration, data cleaning, and extract-transform-load processes. Data transformations are often implemented as relational queries that aim at leveraging the optimization capabilities of most RDBMSs. However, relational query languages like SQL are not expressive enough to specify an important class of data transformations that produce several output tuples for a single input tuple. This class of data transformations is required for solving the data heterogeneities that occur when source data represents an aggregation of target data.

In this paper, we propose and formally define the data mapper operator as an extension of the relational algebra to address one-to-many data transformations. We supply an algebraic rewriting technique that enables the optimization of data transformation expressions that combine filters expressed as standard relational operators with mappers. Furthermore, we identify the two main factors that influence the expected optimization gains.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aho, A.V., Ullman, J.D.: Universality of data retrieval languages. In: Proc. of the 6th ACM SIGACT-SIGPLAN Symposium on Principles of Programming Languages, pp. 110–119. ACM Press, New York (1979)

    Chapter  Google Scholar 

  2. Bernstein, P.A., Rahm, E.: Data wharehouse scenarios for model management. In: Int’l Conf. on Conceptual Modeling / the Entity Relationship Approach (2000)

    Google Scholar 

  3. Carreira, P., Galhardas, H.: Efficient development of data migration transformations. In: ACM SIGMOD Int’l Conf. on the Managment of Data (June 2004)

    Google Scholar 

  4. Carreira, P., Galhardas, H.: Execution of Data Mappers. In: Int’l Workshop on Information Quality in Information Systems. ACM, New York (2004)

    Google Scholar 

  5. Carreira, P., Galhardas, H., Lopes, A., Pereira, J.: Extending the relational algebra with the Mapper operator. DI/FCUL TR 05–2, Department of Informatics, University of Lisbon (January 2005), Available at the url, http://www.di.fc.ul.pt/tech-reports

  6. Chaudhuri, S., Shim, K.: Query optimization in the presence of foreign functions. In: Proc. of the Int’l Conf. on Very Large Data Bases, VLDB 1993 (1993)

    Google Scholar 

  7. Galhardas, H., Florescu, D., Shasha, D., Simon, E.: Ajax: An extensible data cleaning tool. In: ACM SIGMOD Int’l Conf. on Management of Data, vol. 2(29) (2000)

    Google Scholar 

  8. Galhardas, H., Florescu, D., Shasha, D., Simon, E., Saita, C.-A.: Declarative Data Cleaning: Language, Model, and Algorithms. In: Proc. of the Int’l Conf. on Very Large Data Bases (VLDB 2001), Rome, Italy (September 2001)

    Google Scholar 

  9. Haas, L., Miller, R., Niswonger, B., Roth, M.T., Scwarz, P.M., Wimmers, E.L.: Transforming heterogeneous data with database middleware: Beyond integration. Special Issue on Data Transformations. IEEE Data Eng. Bulletin 22(1) (1999)

    Google Scholar 

  10. Hellerstein, J.M.: Optimization techniques for queries with expensive methods. ACM Transactions on Database Systems 22(2), 113–157 (1998)

    Article  MathSciNet  Google Scholar 

  11. Kim, W., Choi, B.-J., Hong, E.-K., Kim, S.-K., Lee, D.: A taxonomy of dirty data. Data Mining and Knowledge Discovery 7(1), 81–99 (2003)

    Article  MathSciNet  Google Scholar 

  12. Lakshmanan, L.V.S., Sadri, F., Subramanian, I.N.: SchemaSQL - A Language for Querying and Restructuring Database Systems. In: Proc. Int’l Conf. on Very Large Databases (VLDB 1996), Bombay, India, September 1996, pp. 239–250 (1996)

    Google Scholar 

  13. Miller, R.J.: Using Schematically Heterogeneous Structures. In: Proc. of ACM SIGMOD Int’l Conf. on the Managment of Data, June 1998, vol. 2(22), pp. 189–200 (1998)

    Google Scholar 

  14. Rahm, E., Do, H.-H.: Data Cleaning: Problems and current approaches. IEEE Bulletin of the Technical Comittee on Data Engineering 24(4) (2000)

    Google Scholar 

  15. Raman, V., Hellerstein, J.M.: Potter’s Wheel: An Interactive Data Cleaning System. In: Proc. of the Int’l Conf. on Very Large Data Bases, VLDB 2001 (2001)

    Google Scholar 

  16. Selinger, P.G., Astrahan, M.M., Chamberlin, D.D., Lorie, R.A., Price, T.G.: Access path selection in a relational database management system. In: ACM SIGMOD Int’l Conf. on the Managment of Data (1979)

    Google Scholar 

  17. Shu, N.C., Housel, B.C., Lum, V.Y.: CONVERT: A High Level Translation Definition Language for Data Conversion. Communications of the ACM 18(10), 557–567 (1975)

    Article  MATH  Google Scholar 

  18. Shu, N.C., Housel, B.C., Taylor, R.W., Ghosh, S.P., Lum, V.Y.: EXPRESS: A Data EXtraction, Processing and REStructuring System. ACM Transactions on Database Systems 2(2), 134–174 (1977)

    Article  Google Scholar 

  19. Simitsis, A., Vassiliadis, P., Sellis, T.K.: Optimizing ETL processes in data warehouses. In: Proc. of the 21st Int’l Conf. on Data Engineering (ICDE) (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Carreira, P., Galhardas, H., Pereira, J., Lopes, A. (2005). Data Mapper: An Operator for Expressing One-to-Many Data Transformations. In: Tjoa, A.M., Trujillo, J. (eds) Data Warehousing and Knowledge Discovery. DaWaK 2005. Lecture Notes in Computer Science, vol 3589. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11546849_14

Download citation

  • DOI: https://doi.org/10.1007/11546849_14

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-28558-8

  • Online ISBN: 978-3-540-31732-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics