Skip to main content

Usability of Visual Data Profiling in Data Cleaning and Transformation

  • Conference paper
  • First Online:
On the Move to Meaningful Internet Systems. OTM 2017 Conferences (OTM 2017)

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 10574))

Abstract

This paper proposes an approach for using visual data profiling in tabular data cleaning and transformation processes. Visual data profiling is the statistical assessment of datasets to identify and visualize potential quality issues. The proposed approach was implemented in a software prototype and empirically validated in a usability study to determine to what extent visual data profiling is useful and how easy it is to use by data scientists. The study involved 24 users in a comparative usability test and 4 expert reviewers in cognitive walkthroughs. The evaluation results show that users find visual data profiling capabilities to be useful and easy to use in the process of data cleaning and transformation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://datagraft.io.

  2. 2.

    https://www.tableau.com.

  3. 3.

    https://www.trifacta.com.

  4. 4.

    https://www.talend.com/products/data-preparation.

  5. 5.

    http://www.statsbygg.no.

  6. 6.

    https://www.sintef.no.

  7. 7.

    http://www.ew-shopp.eu.

  8. 8.

    https://prodatamarket.eu.

  9. 9.

    http://eubusinessgraph.eu.

  10. 10.

    http://www.sintef.no/en/information-and-communication-technology-ict/departments/networked-systems-and-services/human-computer-interaction-hci.

  11. 11.

    http://www.mn.uio.no/ifi/english/research/groups/logid.

  12. 12.

    https://goo.gl/forms/P3pD8zVPOj3uOSLT2.

References

  1. Hellerstein, J.M.: Quantitative data cleaning for large databases. United Nations Economic Commission for Europe (UNECE), February 2008

    Google Scholar 

  2. Kandel, S., Parikh, R., Paepcke, A., Hellerstein, J.M., Heer, J.: Profiler: integrated statistical analysis and visualization for data quality assessment. In: Proceedings of the International Working Conference on Advanced Visual Interfaces, New York, NY, USA, pp. 547–554 (2012)

    Google Scholar 

  3. Redman, T.C.: Bad Data Costs the U.S. $3 Trillion Per Year. Harvard Business Review, 22 September 2016. https://hbr.org/2016/09/bad-data-costs-the-u-s-3-trillion-per-year. Accessed 18 Mar 2017

  4. CrowdFlower|2016 Data Science Report. https://visit.crowdflower.com/data-science-report. Accessed 19 Mar 2017

  5. Han, J., Pei, J., Kamber, M.: Data Mining: Concepts And Techniques. Elsevier, Amsterdam (2011)

    MATH  Google Scholar 

  6. Sukhobok, D., et al.: Tabular Data Cleaning and Linked Data Generation with Grafterizer. ESWC (Satellite Events), pp. 134–139 (2016)

    Google Scholar 

  7. Sukhobok, D., Nikolov, N., Roman, D.: Tabular data anomaly patterns. In: Proceedings of the 3rd International Conference on Big Data Innovations and Applications (Innovate-Data 2017), 21–23 August 2017, to appear

    Google Scholar 

  8. Roman, D., et al.: DataGraft: One-Stop-Shop for Open Data Management. In: The Semantic Web Journal (SWJ) – Interoperability, Usability, Applicability. IOS Press (2017, to appear). ISSN 1570-0844

    Google Scholar 

  9. Roman, D., et al.: Datagraft: Simplifying open data publishing. In: ESWC (Satellite Events), pp. 101–106 (2016)

    Google Scholar 

  10. Roman, D., et al.: DataGraft: a platform for open data publishing. In: Joint Proceedings of the 4th International Workshop on Linked Media and the 3rd Developers Hackshop. (LIME/SemDev@ESWC 2016) (2016)

    Google Scholar 

  11. Stolte, C., Tang, D., Hanrahan, P.: Polaris: a system for query, analysis, and visualization of multidimensional relational databases. IEEE Trans. Visual Comput. Graphics 8(1), 52–65 (2002)

    Article  Google Scholar 

  12. Kandel, S., Paepcke, A., Hellerstein, J., Heer, J.: Wrangler: interactive visual specification of data transformation scripts. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 3363–3372 (2011)

    Google Scholar 

  13. Mutlu, B., Veas, E., Trattner, C., Sabol, V.: VizRec: a two-stage recommender system for personalized visualizations. In: Proceedings of the 20th International Conference on Intelligent User Interfaces Companion, New York, NY, USA, pp. 49–52 (2015)

    Google Scholar 

  14. Voigt, M., Franke, M., Meissner, K.: Using expert and empirical knowledge for context-aware recommendation of visualization components. Int. J. Adv. Life Sci 5, 27–41 (2013)

    Google Scholar 

  15. Mutlu, B., Veas, E., Trattner, C., Sabol, V.: Towards a recommender engine for personalized visualizations. In: International Conference on User Modeling, Adaptation, and Personalization, pp. 169–182 (2015)

    Google Scholar 

  16. Wongsuphasawat, K., Moritz, D., Anand, A., Mackinlay, J., Howe, B., Heer, J.: Voyager: exploratory analysis via faceted browsing of visualization recommendations. IEEE Trans. Visual Comput. Graphics 22(1), 649–658 (2016)

    Article  Google Scholar 

  17. Vega-Lite. https://vega.github.io/vega-lite/. Accessed 19 Mar 2017

  18. Wilkinson, L.: The Grammar of Graphics. Springer Science & Business Media, New York (2006)

    MATH  Google Scholar 

  19. Wickham, H.: ggplot2: Elegant Graphics for Data Analysis. Springer, New York (2016)

    Book  MATH  Google Scholar 

  20. Mackinlay, J., Hanrahan, P., Stolte, C.: Show me: automatic presentation for visual analysis. IEEE Trans. Visual Comput. Graphics 13(6), 1137–1144 (2007)

    Article  Google Scholar 

  21. Satyanarayan, A., Russell, R., Hoffswell, J., Heer, J.: Reactive vega: a streaming dataflow architecture for declarative interactive visualization. IEEE Trans. Visual Comput. Graphics 22(1), 659–668 (2016)

    Article  Google Scholar 

  22. Bakke, E., Karger, D.R.: Expressive query construction through direct manipulation of nested relational results. In: Proceedings of the 2016 International Conference on Management of Data, pp. 1377–1392 (2016)

    Google Scholar 

  23. The Guide to Prototyping Process & Fidelity. Studio by UXPin. https://www.uxpin.com/studio/ebooks/prototyping-process-fidelity-guide/. Accessed 13 Apr 2017

  24. Heer, J., Hellerstein, J.M., Kandel, S.: Predictive interaction for data transformation. In: CIDR (2015)

    Google Scholar 

  25. Chen, S.: Six Core Data Wrangling Activities eBook. Trifacta, 23 November 2015

    Google Scholar 

  26. Hanington, B., Martin, B.: Universal Methods of Design: 100 Ways to Research Complex Problems, Develop Innovative Ideas, and Design Effective Solutions. Rockport Publishers, Gloucester (2012)

    Google Scholar 

  27. The ultimate guide to prototyping. Studio by UXPin. https://www.uxpin.com/studio/ebooks/guide-to-prototyping/. Accessed 13 Apr 2017

  28. Familiar, B.: Microservices, IoT and Azure: Leveraging DevOps and Microservice Architecture to deliver SaaS Solutions. Apress, New York (2015)

    Book  Google Scholar 

  29. Davis, F.D.: Perceived usefulness, perceived ease of use, and user acceptance of information technology. MIS Q. 13, 319–340 (1989)

    Article  Google Scholar 

  30. Barnum, C.M.: Usability Testing Essentials: ready, set… Test! Elsevier, Amsterdam (2010)

    Google Scholar 

  31. Sauro, J., Lewis, J.R.: Quantifying the User Experience: Practical Statistics for User Research. Morgan Kaufmann, Burlington (2016)

    Google Scholar 

  32. Nielsen, J.: Usability inspection methods. In: Conference Companion on Human Factors in Computing Systems, pp. 413–414 (1994)

    Google Scholar 

  33. Spencer, R.: The streamlined cognitive walkthrough method, working around social constraints encountered in a software development company, pp. 353–359 (2000)

    Google Scholar 

  34. Mahatody, T., Sagar, M., Kolski, C.: State of the art on the cognitive walkthrough method, its variants and evolutions. Intl. J. Hum.-Comput. Interact. 26(8), 741–785 (2010)

    Article  Google Scholar 

  35. Cognitive Walkthrough|Usability Body of Knowledge. http://www.usabilitybok.org/cognitive-walkthrough. Accessed 10 May 2017

Download references

Acknowledgements

The work in this paper is partly supported by the EC funded projects proDataMarket (Grant number: 644497), euBusinessGraph (Grant number: 732003), and EW-Shopp (Grant number: 732590).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bjørn Marius von Zernichow .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

von Zernichow, B.M., Roman, D. (2017). Usability of Visual Data Profiling in Data Cleaning and Transformation. In: Panetto, H., et al. On the Move to Meaningful Internet Systems. OTM 2017 Conferences. OTM 2017. Lecture Notes in Computer Science(), vol 10574. Springer, Cham. https://doi.org/10.1007/978-3-319-69459-7_32

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-69459-7_32

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-69458-0

  • Online ISBN: 978-3-319-69459-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics