Skip to main content

A Data Quality Improvement Method Based on the Greedy Algorithm

  • Conference paper
  • First Online:
Machine Learning and Intelligent Communications (MLICOM 2019)

Abstract

High-quality data is very important for data analysis and mining. Data quality can be indicated by many indicators, and some methods have been proposed for data quality improvement by improving one or more data quality indicators. However, there is few work to discuss the impact of the processing order of data quality indicators on the overall data quality. In this paper, first, some data quality indicators and their improvement methods are given; second, the impact of the processing order of data quality indicators on the overall data quality is discussed, and then a novel data quality improvement method based on the greedy algorithm is proposed. Experiments have been shown that the proposed method can improves the data quality while reducing the time and computational costs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Li, Cai, Yu, L., Zhu, Y., et al.: Historical evolution and development trend of data quality. Comput. Sci. 45(4), 1–10 (2018)

    Google Scholar 

  2. Saha, B., Srivastava, D.: Data quality: the other face of big data. In: IEEE International Conference on Data Engineering. IEEE (2014)

    Google Scholar 

  3. Wang, R.Y., Strong, D.M.: Beyond accuracy: what data quality means to data consumers. J. Manag. Inf. Syst. 12(4), 5–33 (1996)

    Article  Google Scholar 

  4. Sidi, F., Panahy, P.H.S., Affendey, L.S., et al.: Data quality: a survey of data quality dimensions. In: International Conference on Information Retrieval & Knowledge Management (2012)

    Google Scholar 

  5. Zaveri, A., Rula, A., Maurino, A., et al.: Quality assessment for linked data: a survey. Semant. Web 7(1), 63–93 (2015)

    Article  Google Scholar 

  6. Wang, Z., Yang, Q.: Research on the quality and standardization of scientific data. Stand. Sci. 03, 25–30 (2019)

    Google Scholar 

  7. Mohan, Li, Li, J., Gao, H.: Solution algorithm for data timeliness determination. J. Comput. Sci. 35(11), 2348–2360 (2012)

    Google Scholar 

  8. Fan, W., Geerts, F.: Relative information completeness. ACM Trans. Database Syst. (TODS) 35(4), 1–44 (2010)

    Article  Google Scholar 

  9. Fan, W., Li, J., Ma, S., et al.: Interaction between record matching and data repairing. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2011, 12–16 June 2011, Athens, Greece. ACM (2011)

    Google Scholar 

  10. Fan, W., Ma, S., Tang, N., Yu, W.: Interaction between record matching and data repairing. J. Data Inf. Qual. (JDIQ) 4(4), 16 (2014)

    Article  Google Scholar 

  11. Quercia, D., Hogan, B.: Proceedings of the Ninth International AAAI Conference on Web and Social Media - ICWSM 2015. AAAI Press (2015)

    Google Scholar 

  12. Ding, X., Wang, H., Zhang, X., et al.: Research on the relationship among various properties of data quality. J. Softw. 27(7), 1626–1644 (2016)

    Google Scholar 

  13. Cheng, H., Feng, D., Shi, X., et al.: Data quality analysis and cleaning strategy for wireless sensor networks. Eurasip J. Wirel. Commun. Netw. 2018(1), 61 (2018)

    Article  Google Scholar 

  14. Kleindienst, D.: The data quality improvement plan: deciding on choice and order of data quality improvements. Electron. Markets 27(4), 1–12 (2017)

    Article  Google Scholar 

  15. Helfert, M., Foley, O., Ge, M., et al.: Limitations of Weighted Sum Measures for Information Quality (2009)

    Google Scholar 

  16. Batini, C., Cappiello, C., Francalanci, C., et al.: Methodologies for data quality assessment and improvement. ACM Comput. Surv. 41(3), 16 (2009)

    Article  Google Scholar 

  17. Zhao, W., Li, C.: A review of the research on quality evaluation methods of associated data. Intell. Theory Practice 39(02), 134–138+128 (2016)

    Google Scholar 

  18. Liu, H.: Analysis of statistical data quality. In: International Joint Conference on Computational Sciences & Optimization. IEEE (2014)

    Google Scholar 

  19. Alpar, P., Winkelsträter, S.: Assessment of data quality accounting data with association rules. Expert Syst. Appl. 41(5), 2259–2268 (2014)

    Google Scholar 

  20. Vaziri, R., Mohsenzadeh, M., Habibi, J.: Measuring data quality with weighted metrics. Total Qual. Manag. Bus. Excellence 30(5–6), 708–720 (2019)

    Article  Google Scholar 

Download references

Acknowledgments

This work was supported by the State Grid Corporation Science and Technology Project (Contract No.: SGLNXT00YJJS1800110).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chunhe Song .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wang, Z., Fu, Y., Song, C., Ge, W., Qiao, L., Zhang, H. (2019). A Data Quality Improvement Method Based on the Greedy Algorithm. In: Zhai, X., Chen, B., Zhu, K. (eds) Machine Learning and Intelligent Communications. MLICOM 2019. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 294. Springer, Cham. https://doi.org/10.1007/978-3-030-32388-2_22

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-32388-2_22

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-32387-5

  • Online ISBN: 978-3-030-32388-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics