Statistical Relational Data Integration for Information Extraction

Niepert, Mathias

doi:10.1007/978-3-642-39784-4_7

Mathias Niepert¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8067))

Included in the following conference series:

Reasoning Web International Summer School

1570 Accesses

Abstract

These lecture notes provide a brief overview of some state of the art large scale information extraction projects. Consequently, these projects are related to current research activities in the semantic web community. The majority of the learning algorithms developed for these information extraction projects are based on the lexical and syntactical processing of Wikipedia and large web corpora. Due to the size of the processed data and the resulting intractability of the associated inference problems existing knowledge representation formalism are often inadequate for the task. We will present recent advances in combining tractable logical and probabilistic models that bring statistical language processing and rule-based approaches closer together. With these lecture notes we hope to convince the attendees that there are numerous synergies and research agendas that can arise when uncertainty-based data-driven research meets rule-based schema-driven research. We also describe certain theoretical and practical advances in making probabilistic inference scale to very large problems.

These lecture notes are based on several previous publications of the author and his colleagues in conference proceedings such as AAAI, UAI, IJCAI, and ESWC.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 49.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Albagli, S., Ben-Eliyahu-Zohary, R., Shimony, S.E.: Markov network based ontology matching. In: Proceedings of the International Joint Conference on Artificial Intelligence, pp. 1884–1889 (2009)
Google Scholar
Apsel, U., Brafman, R.: Exploiting uniform assignments in first-order mpe. In: Proceedings of UAI, pp. 74–83 (2012)
Google Scholar
Asano, T.: An improved analysis of goemans and williamson’s lp-relaxation for max sat. Theoretical Computer Science 354(3), 339–353 (2006)
Article MathSciNet MATH Google Scholar
Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.G.: DBpedia: A nucleus for a web of open data. In: Aberer, K., et al. (eds.) ISWC/ASWC 2007. LNCS, vol. 4825, pp. 722–735. Springer, Heidelberg (2007)
Chapter Google Scholar
Bengio, Y., LeCun, Y.: Scaling learning algorithms towards AI. In: Large Scale Kernel Machines. MIT Press (2007)
Google Scholar
Berners-Lee, T.: Linked data – design issues (2006), http://www.w3.org/DesignIssues/LinkedData.html
Bhattacharya, I., Getoor, L.: Entity resolution in graphs. In: Mining Graph Data. Wiley & Sons (2006)
Google Scholar
Bizer, C., Heath, T., Berners-Lee, T.: Linked data – the story so far. International Journal on Semantic Web and Information Systems (2012)
Google Scholar
Bödi, R., Herr, K., Joswig, M.: Algorithms for highly symmetric linear and integer programs. Mathematical Programming 137(1-2), 65–90 (2013)
Article MathSciNet MATH Google Scholar
Bollacker, K., Evans, C., Paritosh, P., Sturge, T., Taylor, J.: Freebase: a collaboratively created graph database for structuring human knowledge. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, pp. 1247–1250 (2008)
Google Scholar
Borgida, A.: On the relative expressiveness of description logics and predicate logics. Artificial Intelligence 82(1-2), 353–367 (1996)
Article MathSciNet Google Scholar
Bui, H.H., Huynh, T.N., Riedel, S.: Automorphism groups of graphical models and lifted variational inference. CoRR, abs/1207.4814 (2012)
Google Scholar
Carlson, A., Betteridge, J., Kisiel, B., Settles, B., Hruschka Jr., E.R., Mitchell, T.M.: Toward an architecture for never-ending language learning. In: Proceedings of the Twenty-Fourth Conference on Artificial Intelligence (AAAI 2010), pp. 1306–1313 (2010)
Google Scholar
Chapelle, O., Schölkopf, B., Zien, A. (eds.): Semi-Supervised Learning. MIT Press, Cambridge (2006)
Google Scholar
Costa, P.C.G., Laskey, K.B.: Pr-owl: A framework for probabilistic ontologies. In: Bennett, B., Fellbaum, C. (eds.) Proceedings of the International Conference on Formal Ontology in Information Systems (FOIS). Frontiers in Artificial Intelligence and Applications, pp. 237–249. IOS Press (2006)
Google Scholar
Cruz, I.F., Stroe, C., Caci, M., Caimi, F., Palmonari, M., Antonelli, F.P., Keles, U.C.: Using AgreementMaker to Align Ontologies for OAEI 2010. In: Proceedings of the 5th Workshop on Ontology Matching (2010)
Google Scholar
Cruz, I., Antonelli, F.P., Stroe, C.: Efficient selection of mappings and automatic quality-driven combination of matching methods. In: Proceedings of the ISWC 2009 Workshop on Ontology Matching (2009)
Google Scholar
David, J., Guillet, F., Briand, H.: Matching directories and OWL ontologies with AROMA. In: Proceedings of the 15th Conference on Information and Knowledge Management (2006)
Google Scholar
de Salvo Braz, R., Amir, E., Roth, D.: MPE and partial inversion in lifted probabilistic variable elimination. In: Proceedings of AAAI, pp. 1123–1130 (2006)
Google Scholar
Diaconis, P.: Finite forms of de finetti’s theorem on exchangeability. Synthese 36(2), 271–281 (1977)
Article MathSciNet MATH Google Scholar
Ding, L., Kolari, P., Ding, Z., Avancha, S.: Bayesowl: Uncertainty modeling in semantic web ontologies. In: Ma, Z. (ed.) Soft Computing in Ontologies and Semantic Web. Springer (2006)
Google Scholar
Domingos, P., Jain, D., Kok, S., Lowd, D., Poon, H., Richardson, M.: Alchemy website (2012), http://alchemy.cs.washington.edu/ (last visit: November 22, 2012)
Etzioni, O., Banko, M., Soderland, S., Weld, D.S.: Open information extraction from the web. Communications of the ACM 51(12), 68–74 (2008)
Article Google Scholar
Etzioni, O., Fader, A., Christensen, J., Soderland, S., Mausam, M.: Open information extraction: the second generation. In: Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence, pp. 3–10 (2011)
Google Scholar
Euzenat, J., Hollink, A.F.L., Joslyn, C., Malaisé, V., Meilicke, C., Pane, A.N.J., Scharffe, F., Shvaiko, P., Spiliopoulos, V., Stuckenschmidt, H., Sváb-Zamazal, O., Svátek, V., dos Santos, C.T., Vouros, G.: Results of the ontology alignment evaluation initiative 2009. In: Proceedings of the ISWC 2009 workshop on Ontology Matching (2009)
Google Scholar
Euzenat, J., Shvaiko, P.: Ontology matching. Springer (2007)
Google Scholar
Euzenat, J., et al.: First Results of the Ontology Alignment Evaluation Initiative 2010. In: Proceedings of the 5th Workshop on Ontology Matching (2010)
Google Scholar
Fellbaum, C.: WordNet. Springer (2010)
Google Scholar
Fellegi, I., Sunter, A.: A theory for record linkage. Journal of the American Statistical Association 64(328), 1183–1210 (1969)
Article MATH Google Scholar
Ferrara, A., Lorusso, D., Montanelli, S., Varese, G.: Towards a Benchmark for Instance Matching. In: The 7th International Semantic Web Conference (2008)
Google Scholar
Finetti, B.D.: Probability, induction and statistics: the art of guessing. Probability and mathematical statistics. Wiley (1972)
Google Scholar
Giugno, R., Lukasiewicz, T.: P-shoq(d): A probabilistic extension of shoq(d) for probabilistic ontologies in the semantic web. In: Flesca, S., Greco, S., Leone, N., Ianni, G. (eds.) JELIA 2002. LNCS (LNAI), vol. 2424, pp. 86–97. Springer, Heidelberg (2002)
Chapter Google Scholar
Gogate, V., Domingos, P.: Probabilistic theorem proving. In: Proceedings of UAI, pp. 256–265 (2011)
Google Scholar
Heinsohn, J.: A hybrid approach for modeling uncertainty in terminological logics. In: Kruse, R., Siegel, P. (eds.) ECSQAU 1991 and ECSQARU 1991. LNCS, vol. 548, pp. 198–205. Springer, Heidelberg (1991)
Chapter Google Scholar
Hinton, G.E., Osindero, S., Teh, Y.W.: A fast learning algorithm for deep belief nets. Neural Computation 18(7), 1527–1554 (2006)
Article MathSciNet MATH Google Scholar
Hoffart, J., Suchanek, F.M., Berberich, K., Weikum, G.: Yago2: A spatially and temporally enhanced knowledge base from wikipedia. Artificial Intelligence 194, 28–61 (2013)
Article MathSciNet MATH Google Scholar
Holi, M., Hyvönen, E.: Modeling uncertainty in semantic web taxonomies. In: Ma, Z. (ed.) Soft Computing in Ontologies and Semantic Web. Springer (2006)
Google Scholar
Hu, W., Chen, J., Cheng, G., Qu, Y.: ObjectCoref & Falcon-AO: Results for OAEI 2010. In: Proceedings of the 5th International Ontology Matching Workshop (2010)
Google Scholar
Huynh, T.N., Mooney, R.J.: Max-margin weight learning for markov logic networks. In: Proceedings of EMCL PKDD, pp. 564–579 (2009)
Google Scholar
Jaeger, M.: Probabilistic reasoning in terminological logics. In: Doyle, J., Sandewall, E., Torasso, P. (eds.) Proceedings of the 4th international Conference on Principles of Knowledge Representation and Reasoning, pp. 305–316. Morgan Kaufmann (1994)
Google Scholar
Jean-Marya, Y.R., Patrick Shironoshitaa, E., Kabuka, M.R.: Ontology matching with semantic verification. Web Semantics 7(3) (2009)
Google Scholar
Kautz, H., Selman, B., Jiang, Y.: A general stochastic approach to solving problems with hard and soft constraints. Satisfiability Problem: Theory and Applications 17 (1997)
Google Scholar
Kersting, K., Ahmadi, B., Natarajan, S.: Counting belief propagation. In: Proceedings of UAI, pp. 277–284 (2009)
Google Scholar
Kersting, K.: Lifted probabilistic inference. In: Proceedings of the 20th European Conference on Artificial Intelligence, pp. 33–38 (2012)
Google Scholar
Kisynski, J., Poole, D.: Lifted aggregation in directed first-order probabilistic models. In: Proceedings of IJCAI, pp. 1922–1929 (2009)
Google Scholar
Koller, D., Friedman, N.: Probabilistic Graphical Models: Principles and Techniques. MIT Press (2009)
Google Scholar
Koller, D., Levy, A., Pfeffer, A.: P-classic: A tractable probabilistic description logic. In: Proceedings of the 14th AAAI Conference on Artificial Intelligence (AAAI 1997), pp. 390–397 (1997)
Google Scholar
Lafferty, J.D., McCallum, A., Pereira, F.C.N.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: Proceedings of the Eighteenth International Conference on Machine Learning, pp. 282–289 (2001)
Google Scholar
Laskey, K.B., Costa, P.C.G.: Of klingons and starships: Bayesian logic for the 23rd century. In: Proceedings of the 21st Conference in Uncertainty in Artificial Intelligence, pp. 346–353. AUAI Press (2005)
Google Scholar
Levenshtein, V.I.: Binary codes capable of correcting deletions and insertions and reversals. In: Doklady Akademii Nauk SSSR, pp. 845–848 (1965)
Google Scholar
Low, Y., Gonzalez, J., Kyrola, A., Bickson, D., Guestrin, C., Hellerstein, J.M.: Graphlab: A new framework for parallel machine learning. In: Proceedings of UAI, pp. 340–349 (2010)
Google Scholar
Manola, F., Miller, E.: RDF primer. Technical report, WWW Consortium (February 2004), http://www.w3.org/TR/2004/REC-rdf-primer-20040210/
Margot, F.: Exploiting orbits in symmetric ilp. Math. Program. 98(1-3), 3–21 (2003)
Article MathSciNet MATH Google Scholar
Margot, F.: Symmetry in integer linear programming. In: 50 Years of Integer Programming 1958-2008, pp. 647–686. Springer, Heidelberg (2010)
Chapter Google Scholar
Meilicke, C., Stuckenschmidt, H.: Analyzing mapping extraction approaches. In: Proceedings of the Workshop on Ontology Matching, Busan, Korea (2007)
Google Scholar
Meilicke, C., Stuckenschmidt, H.: An efficient method for computing alignment diagnoses. In: Polleres, A., Swift, T. (eds.) RR 2009. LNCS, vol. 5837, pp. 182–196. Springer, Heidelberg (2009)
Chapter Google Scholar
Meilicke, C., Tamilin, A., Stuckenschmidt, H.: Repairing ontology mappings. In: Proceedings of the Conference on Artificial Intelligence, Vancouver, Canada, pp. 1408–1413 (2007)
Google Scholar
Melnik, S., Garcia-Molina, H., Rahm, E.: Similarity flooding: A versatile graph matching algorithm and its application to schema matching. In: Proceedings of ICDE, pp. 117–128 (2002)
Google Scholar
Mendes, P.N., Jakob, M., Bizer, C.: Dbpedia: A multilingual cross-domain knowledge base. In: Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC), pp. 1813–1817 (2012)
Google Scholar
Meza-Ruiz, I., Riedel, S.: Multilingual semantic role labelling with markov logic. In: Proceedings of the Conference on Computational Natural Language Learning, pp. 85–90 (2009)
Google Scholar
Milch, B., Zettlemoyer, L.S., Kersting, K., Haimes, M., Kaelbling, L.P.: Lifted probabilistic inference with counting formulas. In: Proceedings of AAAI, pp. 1062–1068 (2008)
Google Scholar
Mitchell, T.M., Betteridge, J., Carlson, A., Hruschka, E., Wang, R.: Populating the semantic web by macro-reading internet text. In: Bernstein, A., Karger, D.R., Heath, T., Feigenbaum, L., Maynard, D., Motta, E., Thirunarayan, K. (eds.) ISWC 2009. LNCS, vol. 5823, pp. 998–1002. Springer, Heidelberg (2009)
Chapter Google Scholar
Mladenov, M., Ahmadi, B., Kersting, K.: Lifted linear programming. Journal of Machine Learning Research 22, 788–797 (2012)
Google Scholar
Niepert, M.: A Delayed Column Generation Strategy for Exact k-Bounded MAP Inference in Markov Logic Networks. In: Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence (2010)
Google Scholar
Niepert, M.: Markov chains on orbits of permutation groups. In: Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI), pp. 624–633 (2012)
Google Scholar
Niepert, M.: Symmetry-aware maginal density estimation. In: Proceedings of the Conference on Artificial Intelligence (AAAI) (2013)
Google Scholar
Niepert, M., Meilicke, C., Stuckenschmidt, H.: A Probabilistic-Logical Framework for Ontology Matching. In: Proceedings of the 24th AAAI Conference on Artificial Intelligence (2010)
Google Scholar
Niepert, M., Meilicke, C., Stuckenschmidt, H.: Towards distributed mcmc inference in probabilistic knowledge bases. In: Proceedings of the Joint Workshop on Automatic Knowledge Base Construction and Web-scale Knowledge Extraction, pp. 1–6 (2012)
Google Scholar
Niepert, M., Noessner, J., Meilicke, C., Stuckenschmidt, H.: Probabilistic-logical web data integration. In: Polleres, A., d’Amato, C., Arenas, M., Handschuh, S., Kroner, P., Ossowski, S., Patel-Schneider, P. (eds.) Reasoning Web 2011. LNCS, vol. 6848, pp. 504–533. Springer, Heidelberg (2011)
Chapter Google Scholar
Niepert, M., Noessner, J., Stuckenschmidt, H.: Log-Linear Description Logics. In: Proceedings of the International Joint Conference on Artificial Intelligence (2011)
Google Scholar
Niu, F., Ré, C., Doan, A.H., Shavlik, J.: Tuffy: Scaling up statistical inference in markov logic networks using an rdbms. Proceedings of the VLDB Endowment 4(6), 373–384 (2011)
Article Google Scholar
Niu, F., Zhang, C., Ré, C., Shavlik, J.: Deepdive: Web-scale knowledge-base construction using statistical learning and inference. In: Second Int.l Workshop on Searching and Integrating New Web Data Sources (2012)
Google Scholar
Noessner, J., Niepert, M., Stuckenschmidt, H.: Coherent top-k ontology alignment for OWL EL. In: Benferhat, S., Grant, J. (eds.) SUM 2011. LNCS, vol. 6929, pp. 415–427. Springer, Heidelberg (2011)
Chapter Google Scholar
Noessner, J., Niepert, M., Stuckenschmidt, H.: RockIt: Exploiting Parallelism and Symmetry for MAP Inference in Statistical Relational Models. In: Proceedings of the Conference on Artificial Intelligence (AAAI) (2013)
Google Scholar
Noessner, J., Niepert, M., Meilicke, C., Stuckenschmidt, H.: Leveraging Terminological Structure for Object Reconciliation. In: Aroyo, L., Antoniou, G., Hyvönen, E., ten Teije, A., Stuckenschmidt, H., Cabral, L., Tudorache, T. (eds.) ESWC 2010, Part II. LNCS, vol. 6089, pp. 334–348. Springer, Heidelberg (2010)
Chapter Google Scholar
Ostrowski, J., Linderoth, J., Rossi, F., Smriglio, S.: Orbital branching. Math. Program. 126(1), 147–178 (2011)
Article MathSciNet MATH Google Scholar
Pan, R., Ding, Z., Yu, Y., Peng, Y.: A bayesian network approach to ontology mapping. In: Gil, Y., Motta, E., Benjamins, V.R., Musen, M.A. (eds.) ISWC 2005. LNCS, vol. 3729, pp. 563–577. Springer, Heidelberg (2005)
Chapter Google Scholar
Poole, D.: First-order probabilistic inference. In: Proceedings of IJCAI, pp. 985–991 (2003)
Google Scholar
Poon, H., Domingos, P.: Sum-product networks: A new deep architecture. In: Proceedings of the 27th Conference on Uncertainty in Artificial Intelligence, pp. 337–346 (2011)
Google Scholar
Richardson, M., Domingos, P.: Markov logic networks. Machine Learning 62(1-2) (2006)
Google Scholar
Riedel, S.: Improving the accuracy and efficiency of map inference for markov logic. In: Proceedings of the Conference on Uncertainty in Artificial Intelligence (2008)
Google Scholar
Saïs, F., Pernelle, N., Rousset, M.-C.: Combining a logical and a numerical method for data reconciliation. In: Spaccapietra, S. (ed.) Journal on Data Semantics XII. LNCS, vol. 5480, pp. 66–94. Springer, Heidelberg (2009)
Chapter Google Scholar
Schoenmackers, S., Etzioni, O., Weld, D.S., Davis, J.: Learning first-order horn clauses from web text. In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pp. 1088–1098 (2010)
Google Scholar
Shavlik, J., Natarajan, S.: Speeding up inference in markov logic networks by preprocessing to reduce the size of the resulting grounded network. In: Proceedings of the 21st International Joint Conference on Artifical intelligence, pp. 1951–1956 (2009)
Google Scholar
Singla, P., Domingos, P.: Lifted first-order belief propagation. In: Proceedings of AAAI, pp. 1094–1099 (2008)
Google Scholar
Stoermer, H., Rassadko, N.: Results of OKKAM feature based entity matching algorithm for instance matching contest of OAEI 2009. In: Proceedings of the ISWC 2009 Workshop on Ontology Matching (2009)
Google Scholar
Suchanek, F.M., Kasneci, G., Weikum, G.: Yago: a core of semantic knowledge. In: Proceedings of the 16th International Conference on World Wide Web, pp. 697–706 (2007)
Google Scholar
Tsarkov, D., Riazanov, A., Bechhofer, S., Horrocks, I.: Using vampire to reason with OWL. In: McIlraith, S.A., Plexousakis, D., van Harmelen, F. (eds.) ISWC 2004. LNCS, vol. 3298, pp. 471–485. Springer, Heidelberg (2004)
Chapter Google Scholar
Van den Broeck, G.: On the completeness of first-order knowledge compilation for lifted probabilistic inference. In: Proceedings of NIPS, pp. 1386–1394 (2011)
Google Scholar
Venugopal, D., Gogate, V.: On lifting the gibbs sampling algorithm. In: Proceedings of Neural Information Processing Systems (NIPS), pp. 1664–1672 (2012)
Google Scholar
Volz, J., Bizer, C., Gaedke, M., Kobilarov, G.: Silk - a link discovery framework for the web of data. In: Proceedings of the WWW 2009 Workshop on Linked Data on the Web (LDOW) (2009)
Google Scholar
Wu, F., Weld, D.S.: Automatically refining the wikipedia infobox ontology. In: Proceeding of the International World Wide Web Conference, pp. 635–644 (2008)
Google Scholar
Yang, Y., Calmet, J.: Ontobayes: An ontology-driven uncertainty model. In: Proceedings of the International Conference on Computational Intelligence for Modelling, Control and Automation and International Conference on Intelligent Agents, Web Technologies and Internet Commerce (CIMCA-IAWTIC 2005), pp. 457–463 (2005)
Google Scholar
Yelland, P.M.: An alternative combination of bayesian networks and description logics. In: Cohn, A., Giunchiglia, F., Selman, B. (eds.) Proceedings of of the 7th International Conference on Knowledge Representation (KR 2000), pp. 225–234. Morgan Kaufman (2000)
Google Scholar
Zhang, X., Zhong, Q., Shi, F., Li, J., Tang, J.: RiMOM results for OAEI 2009. In: Proceedings of the ISWC 2009 Workshop on Ontology Matching (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science & Engineering, University of Washington, Seattle, WA, USA
Mathias Niepert

Authors

Mathias Niepert
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Fakultät Informatik, Technische Universität Dresden, Nöthnitzer Str. 46, 01062, Dresden, Germany
Sebastian Rudolph
Department of Computer Science, University of Oxford, Wolfson Building, Parks Road, OX1 3 QD, Oxford, UK
Georg Gottlob & Ian Horrocks &
Department of Computer Science, Vrije Universiteit Amsterdam, de Boelelaan 1081a, 1081 HV, Amsterdam, The Netherlands
Frank van Harmelen

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Niepert, M. (2013). Statistical Relational Data Integration for Information Extraction. In: Rudolph, S., Gottlob, G., Horrocks, I., van Harmelen, F. (eds) Reasoning Web. Semantic Technologies for Intelligent Data Access. Reasoning Web 2013. Lecture Notes in Computer Science, vol 8067. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-39784-4_7

Download citation

DOI: https://doi.org/10.1007/978-3-642-39784-4_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-39783-7
Online ISBN: 978-3-642-39784-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics