Skip to main content

DataQ: A Data Flow Quality Monitoring System for Aggregative Data Infrastructures

  • Conference paper
  • First Online:
Research and Advanced Technology for Digital Libraries (TPDL 2016)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9819))

Included in the following conference series:

Abstract

Aggregative Data Infrastructures (ADIs) are information systems offering services to integrate content collected from data sources so as to form uniform and richer information spaces and support communities of users with enhanced access services to such content. The resulting information spaces are an important asset for the target communities, whose services demand for guarantees on their “correctness” and “quality” over time, in terms of the expected content (structure and semantics) and of the processes generating such content. Application-level continuous monitoring of ADIs becomes therefore crucial to ensure validation of quality. However, ADIs are in most of the cases the result of patchworks of software components and services, in some cases developed independently, built over time to address evolving requirements. As such they are not generally equipped with embedded monitoring components and ADI admins must rely on third-party monitoring systems. In this paper we describe DataQ, a general-purpose system for flexible and cost-effective data flow quality monitoring in ADIs. DataQ supports ADI admins with a framework where they can (i) represent ADIs data flows and the relative monitoring specification, and (ii) be instructed on how to meet such specification on the ADI side to implement their monitoring functionality.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    CORE - The UK Open Access Aggregator, https://core.ac.uk.

  2. 2.

    Europeana, http://www.europeana.eu.

  3. 3.

    Prometheus, http://prometheus.io.

  4. 4.

    The Elastic stack, https://www.elastic.co.

  5. 5.

    The OpenAIRE EU project, http://www.openaire.eu.

  6. 6.

    Apache HBase, https://hbase.apache.org.

  7. 7.

    Taverna, http://www.taverna.org.uk.

References

  1. Akoka, J., Berti-Équille, L., Boucelma, O., Bouzeghoub, M., Comyn-Wattiau, I., Cosquer, M., Goasdoué-Thion, V., Kedad, Z., Nugier, S., Peralta, V., Sisaid-Cherfi, S.: A framework for quality evaluation in data integration systems. In: 9th International Conference on Enterprise Information Systems, ICEIS (2007)

    Google Scholar 

  2. Artini, M., Bardi, A., Biagini, F., Debole, F., Bruzzo, S.L., Manghi, P., Mikulicic, M., Savino, P., Zoppi, F.: The creation of the European film archive: achieving interoperability and data quality. In: 8th Italian Research Conference on Digital Libraries, IRCDL, pp. 1–12 (2012)

    Google Scholar 

  3. Ballou, D.P., Pazer, H.L.: Modeling data and process quality in multi-input, multi-output information systems. Manag. Sci. 31(2), 150–162 (1985)

    Article  Google Scholar 

  4. Ballou, D., Wang, R., Pazer, H., Tayi, G.K.: Modeling information manufacturing systems to determine information product quality. Manag. Sci. 44(4), 462–484 (1998)

    Article  MATH  Google Scholar 

  5. Bardi, A., Manghi, P., Zoppi, F.: Aggregative data infrastructures for the cultural heritage. In: Dodero, J.M., Palomo-Duarte, M., Karampiperis, P. (eds.) Metadata and Semantics Research. Communications in Computer and Information Science, vol. 343, pp. 239–251. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  6. Batini, C., Barone, D., Cabitza, F., Grega, S.: A data quality methodology for heterogeneous data. Int. J. Database Manag. Syst. 3(1), 60–79 (2011)

    Article  Google Scholar 

  7. Batini, C., Cappiello, C., Francalanci, C., Maurino, A.: Methodologies for data quality assessment and improvement. ACM Comput. Surv. 41(3), 16 (2009)

    Article  Google Scholar 

  8. Boufares, F., Ben Salem, A.: Heterogeneous data-integration and data quality: overview of conflicts. In: 6th International Conference on Sciences of Electronics, Technologies of Information and Telecommunications, SETIT, pp. 867–874 (2012)

    Google Scholar 

  9. González, L., Peralta, V., Bouzeghoub, M., Ruggia, R.: Qbox-services: towards a service-oriented quality platform. In: Heuser, C.A., Pernul, G. (eds.) ER 2009. LNCS, vol. 5833, pp. 232–242. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  10. Huh, Y., Keller, F., Redman, T., Watkins, A.: Data quality. Inf. Softw. Technol. 32(8), 559–565 (1990)

    Article  Google Scholar 

  11. Lemos, F., Bouadjenek, M.R., Bouzeghoub, M., Kedad, Z.: Using the QBox platform to assess quality in data integration systems. Ing. Syst. d’inf. 15(6), 105–124 (2010)

    Google Scholar 

  12. Manghi, P., Artini, M., Atzori, C., Bardi, A., Mannocci, A., La Bruzzo, S., Candela, L., Castelli, D., Pagano, P.: The D-NET software toolkit. Program 48(4), 322–354 (2014)

    Article  Google Scholar 

  13. Manghi, P., Bolikowski, L., Manola, N., Schirrwagen, J., Smith, T.: OpenAIREplus: the European scholarly communication data infrastructure. D-Lib Mag. 18(9–10), 1 (2012)

    Google Scholar 

  14. Mannocci, A., Casarosa, V., Manghi, P., Zoppi, F.: The Europeana network of ancient Greek and Latin epigraphy data infrastructure. In: Closs, S., Studer, R., Garoufallou, E., Sicilia, M.-A. (eds.) MTSR 2014. CCIS, vol. 478, pp. 286–300. Springer, Heidelberg (2014)

    Google Scholar 

  15. Marotta, A., Ruggia, R.: Quality management in multi-source information systems. In: Quality (2002)

    Google Scholar 

  16. Marotta, A., Ruggia, R.: Managing source quality changes in a data integration system. CEUR Workshop Proc. 263, 1168–1176 (2006)

    Google Scholar 

  17. Missier, P., Preece, A.D., Embury, S.M., Jin, B., Greenwood, M., Stead, D., Brown, A.: Managing information quality in e-Science: a case study in proteomics. In: Akoka, J., et al. (eds.) ER Workshops 2005. LNCS, vol. 3770, pp. 423–432. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  18. Peralta, V., Ruggia, R., Kedad, Z., Bouzeghoub, M.: A framework for data quality evaluation in a data integration system. In: SBBD, pp. 134–147 (2004)

    Google Scholar 

  19. Preece, A.D., Jin, B., Pignotti, E., Missier, P., Embury, S.M., Stead, D., Brown, A.: Managing information quality in e-Science using semantic web technology. In: Sure, Y., Domingue, J. (eds.) ESWC 2006. LNCS, vol. 4011, pp. 472–486. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  20. Redman, T.C.: The impact of poor data quality on the typical enterprise. Commun. ACM 41(2), 79–82 (1998)

    Article  Google Scholar 

  21. Reiter, M., Breitenbücher, U., Dustdar, S., Karastoyanova, D., Leymann, F., Truong, H.L.: A novel framework for monitoring and analyzing quality of data in simulation workflows. In: IEEE 7th International Conference on E-Science, pp. 105–112 (2011)

    Google Scholar 

  22. Reiter, M., Breitenbücher, U., Kopp, O., Karastoyanova, D.: Quality of data driven simulation workflows. J. Syst. Integr. 5(1), 3–29 (2014)

    Article  Google Scholar 

  23. Scannapieco, M., Missier, P., Batini, C.: Data quality at a glance. Datenbank-Spektrum 14(January), 6–14 (2005)

    Google Scholar 

  24. Shankaranarayanan, G., Wang, R.Y., Ziad, M.: IP-MAP: representing the manufacture of an information product. In: Proceedings of the 2000 Conference on Information Quality, pp. 1–16 (2000)

    Google Scholar 

  25. Strong, D.M., Lee, Y.W., Wang, R.Y.: Data quality in context. Commun. ACM 40(5), 103–110 (1997)

    Article  Google Scholar 

  26. Tani, A., Candela, L., Castelli, D.: Dealing with metadata quality: the legacy of digital library efforts. Inf. Process. Manag. 49(6), 1194–1205 (2013)

    Article  Google Scholar 

  27. Tayi, G.K., Ballou, D.P.: Examining data quality. Commun. ACM 41(2), 54–57 (1998)

    Article  Google Scholar 

  28. Wang, R., Storey, V., Firth, C.: A framework for analysis of data quality research. IEEE Trans. Knowl. Data Eng. 7(4), 623–640 (1995)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Andrea Mannocci .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Mannocci, A., Manghi, P. (2016). DataQ: A Data Flow Quality Monitoring System for Aggregative Data Infrastructures. In: Fuhr, N., Kovács, L., Risse, T., Nejdl, W. (eds) Research and Advanced Technology for Digital Libraries. TPDL 2016. Lecture Notes in Computer Science(), vol 9819. Springer, Cham. https://doi.org/10.1007/978-3-319-43997-6_28

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-43997-6_28

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-43996-9

  • Online ISBN: 978-3-319-43997-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics