Abstract
Aggregative Data Infrastructures (ADIs) are information systems offering services to integrate content collected from data sources so as to form uniform and richer information spaces and support communities of users with enhanced access services to such content. The resulting information spaces are an important asset for the target communities, whose services demand for guarantees on their “correctness” and “quality” over time, in terms of the expected content (structure and semantics) and of the processes generating such content. Application-level continuous monitoring of ADIs becomes therefore crucial to ensure validation of quality. However, ADIs are in most of the cases the result of patchworks of software components and services, in some cases developed independently, built over time to address evolving requirements. As such they are not generally equipped with embedded monitoring components and ADI admins must rely on third-party monitoring systems. In this paper we describe DataQ, a general-purpose system for flexible and cost-effective data flow quality monitoring in ADIs. DataQ supports ADI admins with a framework where they can (i) represent ADIs data flows and the relative monitoring specification, and (ii) be instructed on how to meet such specification on the ADI side to implement their monitoring functionality.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
CORE - The UK Open Access Aggregator, https://core.ac.uk.
- 2.
Europeana, http://www.europeana.eu.
- 3.
Prometheus, http://prometheus.io.
- 4.
The Elastic stack, https://www.elastic.co.
- 5.
The OpenAIRE EU project, http://www.openaire.eu.
- 6.
Apache HBase, https://hbase.apache.org.
- 7.
Taverna, http://www.taverna.org.uk.
References
Akoka, J., Berti-Équille, L., Boucelma, O., Bouzeghoub, M., Comyn-Wattiau, I., Cosquer, M., Goasdoué-Thion, V., Kedad, Z., Nugier, S., Peralta, V., Sisaid-Cherfi, S.: A framework for quality evaluation in data integration systems. In: 9th International Conference on Enterprise Information Systems, ICEIS (2007)
Artini, M., Bardi, A., Biagini, F., Debole, F., Bruzzo, S.L., Manghi, P., Mikulicic, M., Savino, P., Zoppi, F.: The creation of the European film archive: achieving interoperability and data quality. In: 8th Italian Research Conference on Digital Libraries, IRCDL, pp. 1–12 (2012)
Ballou, D.P., Pazer, H.L.: Modeling data and process quality in multi-input, multi-output information systems. Manag. Sci. 31(2), 150–162 (1985)
Ballou, D., Wang, R., Pazer, H., Tayi, G.K.: Modeling information manufacturing systems to determine information product quality. Manag. Sci. 44(4), 462–484 (1998)
Bardi, A., Manghi, P., Zoppi, F.: Aggregative data infrastructures for the cultural heritage. In: Dodero, J.M., Palomo-Duarte, M., Karampiperis, P. (eds.) Metadata and Semantics Research. Communications in Computer and Information Science, vol. 343, pp. 239–251. Springer, Heidelberg (2012)
Batini, C., Barone, D., Cabitza, F., Grega, S.: A data quality methodology for heterogeneous data. Int. J. Database Manag. Syst. 3(1), 60–79 (2011)
Batini, C., Cappiello, C., Francalanci, C., Maurino, A.: Methodologies for data quality assessment and improvement. ACM Comput. Surv. 41(3), 16 (2009)
Boufares, F., Ben Salem, A.: Heterogeneous data-integration and data quality: overview of conflicts. In: 6th International Conference on Sciences of Electronics, Technologies of Information and Telecommunications, SETIT, pp. 867–874 (2012)
González, L., Peralta, V., Bouzeghoub, M., Ruggia, R.: Qbox-services: towards a service-oriented quality platform. In: Heuser, C.A., Pernul, G. (eds.) ER 2009. LNCS, vol. 5833, pp. 232–242. Springer, Heidelberg (2009)
Huh, Y., Keller, F., Redman, T., Watkins, A.: Data quality. Inf. Softw. Technol. 32(8), 559–565 (1990)
Lemos, F., Bouadjenek, M.R., Bouzeghoub, M., Kedad, Z.: Using the QBox platform to assess quality in data integration systems. Ing. Syst. d’inf. 15(6), 105–124 (2010)
Manghi, P., Artini, M., Atzori, C., Bardi, A., Mannocci, A., La Bruzzo, S., Candela, L., Castelli, D., Pagano, P.: The D-NET software toolkit. Program 48(4), 322–354 (2014)
Manghi, P., Bolikowski, L., Manola, N., Schirrwagen, J., Smith, T.: OpenAIREplus: the European scholarly communication data infrastructure. D-Lib Mag. 18(9–10), 1 (2012)
Mannocci, A., Casarosa, V., Manghi, P., Zoppi, F.: The Europeana network of ancient Greek and Latin epigraphy data infrastructure. In: Closs, S., Studer, R., Garoufallou, E., Sicilia, M.-A. (eds.) MTSR 2014. CCIS, vol. 478, pp. 286–300. Springer, Heidelberg (2014)
Marotta, A., Ruggia, R.: Quality management in multi-source information systems. In: Quality (2002)
Marotta, A., Ruggia, R.: Managing source quality changes in a data integration system. CEUR Workshop Proc. 263, 1168–1176 (2006)
Missier, P., Preece, A.D., Embury, S.M., Jin, B., Greenwood, M., Stead, D., Brown, A.: Managing information quality in e-Science: a case study in proteomics. In: Akoka, J., et al. (eds.) ER Workshops 2005. LNCS, vol. 3770, pp. 423–432. Springer, Heidelberg (2005)
Peralta, V., Ruggia, R., Kedad, Z., Bouzeghoub, M.: A framework for data quality evaluation in a data integration system. In: SBBD, pp. 134–147 (2004)
Preece, A.D., Jin, B., Pignotti, E., Missier, P., Embury, S.M., Stead, D., Brown, A.: Managing information quality in e-Science using semantic web technology. In: Sure, Y., Domingue, J. (eds.) ESWC 2006. LNCS, vol. 4011, pp. 472–486. Springer, Heidelberg (2006)
Redman, T.C.: The impact of poor data quality on the typical enterprise. Commun. ACM 41(2), 79–82 (1998)
Reiter, M., Breitenbücher, U., Dustdar, S., Karastoyanova, D., Leymann, F., Truong, H.L.: A novel framework for monitoring and analyzing quality of data in simulation workflows. In: IEEE 7th International Conference on E-Science, pp. 105–112 (2011)
Reiter, M., Breitenbücher, U., Kopp, O., Karastoyanova, D.: Quality of data driven simulation workflows. J. Syst. Integr. 5(1), 3–29 (2014)
Scannapieco, M., Missier, P., Batini, C.: Data quality at a glance. Datenbank-Spektrum 14(January), 6–14 (2005)
Shankaranarayanan, G., Wang, R.Y., Ziad, M.: IP-MAP: representing the manufacture of an information product. In: Proceedings of the 2000 Conference on Information Quality, pp. 1–16 (2000)
Strong, D.M., Lee, Y.W., Wang, R.Y.: Data quality in context. Commun. ACM 40(5), 103–110 (1997)
Tani, A., Candela, L., Castelli, D.: Dealing with metadata quality: the legacy of digital library efforts. Inf. Process. Manag. 49(6), 1194–1205 (2013)
Tayi, G.K., Ballou, D.P.: Examining data quality. Commun. ACM 41(2), 54–57 (1998)
Wang, R., Storey, V., Firth, C.: A framework for analysis of data quality research. IEEE Trans. Knowl. Data Eng. 7(4), 623–640 (1995)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Mannocci, A., Manghi, P. (2016). DataQ: A Data Flow Quality Monitoring System for Aggregative Data Infrastructures. In: Fuhr, N., Kovács, L., Risse, T., Nejdl, W. (eds) Research and Advanced Technology for Digital Libraries. TPDL 2016. Lecture Notes in Computer Science(), vol 9819. Springer, Cham. https://doi.org/10.1007/978-3-319-43997-6_28
Download citation
DOI: https://doi.org/10.1007/978-3-319-43997-6_28
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-43996-9
Online ISBN: 978-3-319-43997-6
eBook Packages: Computer ScienceComputer Science (R0)