Abstract
Data integration systems (DIS) are devoted to providing information by integrating and transforming data extracted from external sources. Examples of DIS are the mediators, data warehouses, federations of databases, and web portals. Data quality is an essential issue in DIS as it concerns the confidence of users in the supplied information. One of the main challenges in this field is to offer rigorous and practical means to evaluate the quality of DIS. In this sense, DIS reliability intends to represent its capability for providing data with a certain level of quality, taking into account not only current quality values but also the changes that may occur in data quality at the external sources. Simulation techniques constitute a non-traditional approach to data quality evaluation, and more specifically for DIS reliability. This chapter presents techniques for DIS reliability evaluation by applying simulation techniques in addition to exact computation models. Simulation enables some important drawbacks of exact techniques to be addressed: the scalability of the reliability computation when the set of data sources grows, and modeling data sources with inter-related (non independent) quality properties.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
In the solution given in this work we do not differentiate between “quality factor” and “quality dimension.”
References
Bulteau S, El Khadiri M (2002) A new importance sampling Monte Carlo method for a flow network reliability problem. Naval Res Logist 49(2):204–228
Canavos G (1988) Probabilidad y estadística. Aplicaciones y métodos. McGraw Hill, Madrid, Spain [ISBN: 968-451-856-0]
Cho J, Garcia-Molina H (2003) Estimating frequency of change. ACM Trans Internet Technol 3(3):256–290
Cancela H, El Khadiri M, Rubino G (2006) An efficient simulation method for K-network reliability problem. In 6th international workshop on rare event simulation (RESIM’2006), Bamberg, Germany
Cancela H, El Khadiri M, Rubino G (2009) Rare events analysis by Monte Carlo techniques in static models. In: Rubino G and Tuffin B (eds) Rare event simulation methods using Monte Carlo methods, Chap 7. Wiley, Chichester, UK
Cancela H, Murray L, Rubino G (2008) Splitting in source-terminal network reliability estimation. In: 7th international workshop on rare event simulation (RESIM’2008), Rennes, France
Gertsbakh I (1989) Statistical reliability theory. Probability: pure and applied. (A series of text books and reference books.) Marcel Dekker, New York, NY, USA [ISBN: 0-8247-8019-1]
Gertz M, Tamer Ozsu M, Saake G, Sattler K (1998) Managing data quality and integrity in federated databases. In: 2nd working conference on integrity and internal control in information systems (IICIS’1998), Warrenton, USA, Kluwer, Deventer, The Netherlands
Gertz M, Tamer Ozsu M, Saake G, Sattler K (2004) Report on the Dagstuhl seminar: data quality on the web. SIGMOD Rec 33(1), March. vol 33, issue 1 (March 2004) ACM, New York, NY, USA, pp 127–132
Helfert M, Herrmann C (2002) Proactive data quality management for data warehouse systems. In: International workshop on design and management of data warehouses (DMDW’2002), Toronto, Canada. University of Toronto Bookstores, Toronto, Canada, pp 97–106
Hui K, Bean N, Kraetzl M, Kroese D (2005) The cross-entropy method for network reliability estimation. Oper Res 134:101–118
Jankowska M A (2000) The need for environmental information quality. Issues in Science and Technology Librarianship. http://www.library.ucsb.edu/istl/00-spring/article5.html (Last modified in 2000.)
Jarke M, Vassiliou Y (1997) Data warehouse quality: a review of the DWQ project. In: 2nd conference on information quality (IQ’1997), Cambridge, MA, MIT Pub, Cambridge, MA, USA
Marotta A (2008) Data quality maintenance in data integration systems. PhD thesis, University of the Republic, Uruguay
Marotta A, Ruggia R (2008) Applying probabilistic models to data quality change management. In: 3rd international conference on software and data technologies (ICSOFT’2008), Porto, Portugal, INSTICC, Setubal, Portugal
Mazzi G L, Museux J M, Savio G (2005) Quality measures for economic indicators. Statistical Office of the European Communities, Eurostat, http://epp.eurostat.ec.europa.eu/cache/ITY_OFFPUB/KS-DT-05-003/EN/KS-DT-05-003-EN.PDF [ISBN 92-894-8623-6]
Müller H, Naumann F (2003) Data quality in genome databases. In: Proceedings of the 8th international conference on information quality (IQ 2003), MIT, Cambridge, MA, USA
Neely M (2005) The product approach to data quality and fitness for use: a framework for analysis. In: 10th international conference on information quality (IQ’2005), Cambridge, MA, MIT Pub, Cambridge, MA, USA
Peralta V (2006) Data quality evaluation in data integration systems. PhD thesis, University of Versailles, France and University of the Republic, Uruguay.
Peralta V, Ruggia R, Bouzeghoub M (2004) Analyzing and evaluating data freshness in data integration systems. Ing Syst Inf 9(5–6):145–162
Peralta V, Ruggia R, Kedad Z, Bouzeghoub M (2004) A framework for data quality evaluation in a data integration system. In: 19th Brazilian symposium on databases (SBBD’2004), Brasilia, Brazil, Universidade de Brasilia, Brasilia, Brasil, pp 134–147
Rubino G (1999) Network reliability evaluation. In: Walrand J, Bagchi K, Zobrist G (eds) Network performance modeling and simulation. Gordon and Breach Science Publishers, Amsterdam
Salanti G, Sanderson S, Higgins J (2005) Obstacles and opportunities in meta-analysis of genetic association studies. Genet Med 7(1):13–20
Scannapieco M, Missier P, Batini C (2005) Data quality at a glance. Datenbank-Spektrum 14:6–14
US Environment Protection Agency (2004) Increase the availability of quality health and environmental information. Available at http://www.epa.gov/oei/increase.htm (last accessed August 2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag London Limited
About this chapter
Cite this chapter
Marotta, A., Cancela, H., Peralta, V., Ruggia, R. (2010). Reliability Models for Data Integration Systems. In: Faulin, J., Juan, A., Martorell, S., Ramírez-Márquez, JE. (eds) Simulation Methods for Reliability and Availability of Complex Systems. Springer Series in Reliability Engineering. Springer, London. https://doi.org/10.1007/978-1-84882-213-9_6
Download citation
DOI: https://doi.org/10.1007/978-1-84882-213-9_6
Publisher Name: Springer, London
Print ISBN: 978-1-84882-212-2
Online ISBN: 978-1-84882-213-9
eBook Packages: EngineeringEngineering (R0)