Skip to main content

Developing a Data Quality Algebra

  • Chapter
Data Quality

Part of the book series: Advances in Database Systems ((ADBS,volume 23))

  • 458 Accesses

Conclusion

We have presented a method for estimating the quality (accuracy) of query results derived from underlying base relations in a relational database through a database query. Since the quality of the derived data is a function of the query, the emphasis is on estimating the quality for the output of every operator that could be present in the relational algebra. By postulating the impact on the quality for each operator in theoretical terms, the quality profile of the output can be generated for any arbitrary query comprised of such operators. Some operators increase the degree of potential quality, while others tend to decrease the quality. This analysis, therefore, provides a basis for further research in formulating queries in terms of preferred operators with the objective of attaining the optimized quality profile for a given set of data sources and their corresponding quality profiles.

Clearly, the validity of the computed quality is a function of the validity of the quality figures of the underlying base relations. The latter figures obtained via sampling and other techniques are themselves prone to error. In the absence of more stringent information for each base relation, the data quality profile has been assumed to be uniform; however, the impact of non-uniform accuracy profiles has also been studied. The issue of defining rigorous techniques for estimating the idiosyncrasies of individual data parameters is an area requiring further research.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ballou, D. P. and H. L. Pazer, “Modeling Data and Process Quality in Multi-input, Multioutput Information Systems,” Management Science, 31(2), 1985, pp. 150–162.

    Google Scholar 

  2. Ballou, D. P. and H. L. Pazer, “Designing Information Systems to Optimize the Accuracy-Timeliness Tradeoff,” Information Systems Research, 6(1), 1995, pp. 51–72.

    Article  Google Scholar 

  3. Ceri, S. and G. Pelagatti, Distributed Databases Principles & Systems. 1st ed. McGraw-Hill, 1984.

    Google Scholar 

  4. Codd, E. F., “A Relational Model of Data for Large Shared Data Banks,” Communications of the ACM, 13(6), 1970, pp. 377–387.

    Article  MATH  Google Scholar 

  5. Codd, E. F., The Relational Model for Database Management: Version 2. Addison-Wesley, Reading, MA, 1990.

    MATH  Google Scholar 

  6. Date, C. J. “Referential Integrity,” in Proceedings of the Proceedings of the 7th International Conference on Very Large Data bases. Cannes, France: pp. 2–12, 1981.

    Google Scholar 

  7. Date, C. J., An Introduction to Database Systems. 5th ed. Addison-Wesley Systems Programming Series, Addison-Wesley, Reading, 1990.

    Google Scholar 

  8. Janson, M., “Data Quality: The Achilles Heel of End-User Computing,” Omega Int. J. of Mgmt. Sci., 16(5), 1988, pp. 491–502.

    MathSciNet  Google Scholar 

  9. Kent, W., Data and Reality. North Holland, New York, 1978.

    Google Scholar 

  10. Klug, A., “Equivalence of relational algebra and relational calculus query languages having aggregate functions,” The Journal of ACM, 29, 1982, pp. 699–717.

    MATH  Google Scholar 

  11. Madnick, S. E., “Challenges in the “on-and-off the ramps” of the Information Superhighway,” Journal of Organizational Computing, 1995.

    Google Scholar 

  12. Paradice, D. B. and W. L. Fuerst, “An MIS data quality methodology based on optimal error detection,” 5(1), 1991, pp. 48–66.

    Google Scholar 

  13. Reddy, M. P. and R. Y. Wang. “Estimating Data Accuracy in a Federated Database Environment,” in Proceedings of 6th International Conference, CISMOD (Also in Lecture Notes in Computer Science). Bombay, India: pp. 115–134, 1995.

    Google Scholar 

  14. Siegel, M. and S. E. Madnick. “A metadata approach to resolving semantic conflicts,” in Proceedings of the proceedings of the 17th International Conference on Very Large Data Bases (VLDB). Barcelona, Spain: pp. 133–145, 1991.

    Google Scholar 

  15. Siegel, M., E. Sciore and A. Rosenthal, Using Semantic Values to Facilitate Interoperability Among Heterogeneous Information Systems (No. 3543-93). Context Interchange Project, MIT Sloan School of Management, 1993.

    Google Scholar 

  16. Wand, Y. and R. Y. Wang, “Anchoring Data Quality Dimensions in Ontological Foundations,” Forthcoming, Communications of the ACM, 1995.

    Google Scholar 

  17. Wang, R. Y., H. B. Kon and S. E. Madnick. “Data Quality Requirements Analysis and Modeling,” in Proceedings of the 9th International Conference on Data Engineering. Vienna: pp. 670–677, 1993.

    Google Scholar 

  18. Wang, R. Y., M. P. Reddy and H. B. Kon, “Toward quality data: An attribute-based approach,” Decision Support Systems (DSS), 13, 1995, pp. 349–372.

    Google Scholar 

  19. Wang, Y. R. and S. E. Madnick. “A Polygen Model for Heterogeneous Database Systems: The Source Tagging Perspective,” in Proceedings of the 16th International Conference on Very Large Data bases (VLDB). Brisbane, Australia: pp. 519–538, 1990.

    Google Scholar 

Download references

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Kluwer Academic Publishers

About this chapter

Cite this chapter

(2002). Developing a Data Quality Algebra. In: Data Quality. Advances in Database Systems, vol 23. Springer, Boston, MA. https://doi.org/10.1007/0-306-46987-1_5

Download citation

  • DOI: https://doi.org/10.1007/0-306-46987-1_5

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-0-7923-7215-8

  • Online ISBN: 978-0-306-46987-9

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics