Skip to main content

DART: A Data Acquisition and Repairing Tool

  • Conference paper
Current Trends in Database Technology – EDBT 2006 (EDBT 2006)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4254))

Included in the following conference series:

Abstract

An architecture is proposed providing robust data acquisition facilities from input documents containing tabular data. This architecture is based on a data-repairing framework exploiting integrity constraints defined on the input data to support the detection and the repair of inconsistencies in the data arising from errors occurring in the acquisition phase. In particular, a specific but expressive form of integrity constraints (steady aggregate constraints) is defined which enables the computation of a repair to be expressed as a mixed integer linear programming problem.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agarwal, S., Keller, A.M., Wiederhold, G., Saraswat, K.: Flexible Relation: An Approach for Integrating Data from Multiple, Possibly Inconsistent Databases. In: Proc. International Conference on Data Engineering (ICDE), pp. 495–504 (1995)

    Google Scholar 

  2. Arenas, M., Bertossi, L.E., Chomicki, J.: Consistent Query Answers in Inconsistent Databases. In: Proc. Symposium on Principles of Database Systems (PODS), pp. 68–79 (1999)

    Google Scholar 

  3. Arenas, M., Bertossi, L.E., Chomicki, J.: Specifying and Querying Database Repairs using Logic Programs with Exceptions. In: Proc. International Conference on Flexible Query Answering Systems (FQAS), pp. 27–41 (2000)

    Google Scholar 

  4. Arenas, M., Bertossi, L.E., Chomicki, J., He, X., Raghavan, V., Spinrad, J.: Scalar aggregation in inconsistent databases. Theoretical Computer Science 3(296), 405–434 (2003)

    Article  MathSciNet  Google Scholar 

  5. Baumgartner, R., Flesca, S., Gottlob, G.: Visual Web Information Extraction with Lixto. In: Proc. International Conference on Very Large Data Bases (VLDB), pp. 119–128 (2001)

    Google Scholar 

  6. Bertossi, L., Bravo, L., Franconi, E., Lopatenko, A.: Complexity and Approximation of Fixing Numerical Attributes in Databases Under Integrity Constraints. In: Proc. International Symposium on Database Programming Languages (DBPL), pp. 262–278 (2005)

    Google Scholar 

  7. Bohannon, P., Flaster, M., Fan, W., Rastogi, R.: A Cost-Based Model and Effective Heuristic for Repairing Constraints by Value Modification. In: Proc. ACM SIGMOD International Conference on Management of Data (SIGMOD), pp. 143–154 (2005)

    Google Scholar 

  8. Bry, F.: Query Answering in Information Systems with Integrity Constraints. In: IFIP WG 11.5 Working Conference on Integrity and Control in Information Systems, pp. 113–130 (1997)

    Google Scholar 

  9. Chomicki, J., Marcinkowski, J., Staworko, S.: Computing consistent query answers using conflict hypergraphs. In: Proc. International Conference on Information and Knowledge Management (CIKM), pp. 417–426 (2004)

    Google Scholar 

  10. Chomicki, J., Marcinkowski, J., Staworko, S.: Hippo: A System for Computing Consistent Answers to a Class of SQL Queries. In: Bertino, E., Christodoulakis, S., Plexousakis, D., Christophides, V., Koubarakis, M., Böhm, K., Ferrari, E. (eds.) EDBT 2004. LNCS, vol. 2992, pp. 841–844. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  11. Chomicki, J., Marcinkowski, J.: Minimal-Change Integrity Maintenance Using Tuple Deletions. Information and Computation (IC) 197(1-2), 90–121 (2005)

    Article  MATH  MathSciNet  Google Scholar 

  12. Cohen, W.W., Hurst, M., Jensen, L.S.: A flexible learning system for wrapping tables and lists in HTML documents. In: Proc. International World Wide Web Conference (WWW), pp. 232–241 (2002)

    Google Scholar 

  13. Crescenzi, V., Mecca, G., Merialdo, P.: RoadRunner: Towards Automatic Data Extraction from Large Web Sites. In: Proc. International Conference on Very Large Data Bases (VLDB), pp. 109–118 (2001)

    Google Scholar 

  14. Embley, D.W., Tao, C., Liddle, S.W.: Automating the extraction of data from HTML tables with unknown structure. Data & Knowledge Engineering 54(1), 3–28 (2005)

    Article  Google Scholar 

  15. Fazzinga, B., Flesca, S., Tagarelli, A.: Learning Robust Web Wrappers. In: Andersen, K.V., Debenham, J., Wagner, R. (eds.) DEXA 2005. LNCS, vol. 3588, pp. 736–745. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  16. Flesca, S., Furfaro, F., Parisi, F.: Consistent Query Answer on Numerical Databases under Aggregate Constraint. In: Proc. International Symposium on Database Programming Languages (DBPL), pp. 279–294 (2005)

    Google Scholar 

  17. Flesca, S., Tagarelli, A.: Schema-Based Web Wrapping. In: Atzeni, P., Chu, W., Lu, H., Zhou, S., Ling, T.-W. (eds.) ER 2004. LNCS, vol. 3288, pp. 286–299. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  18. Gass, S.I.: Linear Programming Methods and Applications. McGraw Hill, New York (1985)

    MATH  Google Scholar 

  19. Greco, G., Greco, S., Zumpano, E.: A Logical Framework for Querying and Repairing Inconsistent Databases. IEEE Transactions on Knowledge and Data Engineering (TKDE) 15(6), 1389–1408 (2003)

    Article  Google Scholar 

  20. Laender, A.H.F., Ribeiro-Neto, B.A., da Silva, A.S.: DEByE - Data Extraction By Example. Data & Knowledge Engineering 40(2), 121–154 (2002)

    Article  MATH  Google Scholar 

  21. Liu, L., Pu, C., Han, W.: XWRAP: An XML-Enabled Wrapper Construction System for Web Information Sources. In: Proc. International Conference on Data Engineering (ICDE), pp. 611–621 (2000)

    Google Scholar 

  22. Papadimitriou, C.H.: On the complexity of integer programming. Journal of the Association for Computing Machinery (JACM) 28(4), 765–768 (1981)

    MATH  MathSciNet  Google Scholar 

  23. Papadimitriou, C.H.: Computational Complexity. Addison-Wesley, Reading (1994)

    MATH  Google Scholar 

  24. Wijsen, J.: Condensed representation of database repairs for consistent query answering. In: Calvanese, D., Lenzerini, M., Motwani, R. (eds.) ICDT 2003. LNCS, vol. 2572, pp. 375–390. Springer, Heidelberg (2002)

    Google Scholar 

  25. Wijsen, J.: Making More Out of an Inconsistent Database. In: Benczúr, A.A., Demetrovics, J., Gottlob, G. (eds.) ADBIS 2004. LNCS, vol. 3255, pp. 291–305. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Fazzinga, B., Flesca, S., Furfaro, F., Parisi, F. (2006). DART: A Data Acquisition and Repairing Tool. In: Grust, T., et al. Current Trends in Database Technology – EDBT 2006. EDBT 2006. Lecture Notes in Computer Science, vol 4254. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11896548_25

Download citation

  • DOI: https://doi.org/10.1007/11896548_25

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-46788-5

  • Online ISBN: 978-3-540-46790-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics