Abstract
These lecture notes provide a brief overview of some state of the art large scale information extraction projects. Consequently, these projects are related to current research activities in the semantic web community. The majority of the learning algorithms developed for these information extraction projects are based on the lexical and syntactical processing of Wikipedia and large web corpora. Due to the size of the processed data and the resulting intractability of the associated inference problems existing knowledge representation formalism are often inadequate for the task. We will present recent advances in combining tractable logical and probabilistic models that bring statistical language processing and rule-based approaches closer together. With these lecture notes we hope to convince the attendees that there are numerous synergies and research agendas that can arise when uncertainty-based data-driven research meets rule-based schema-driven research. We also describe certain theoretical and practical advances in making probabilistic inference scale to very large problems.
These lecture notes are based on several previous publications of the author and his colleagues in conference proceedings such as AAAI, UAI, IJCAI, and ESWC.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Albagli, S., Ben-Eliyahu-Zohary, R., Shimony, S.E.: Markov network based ontology matching. In: Proceedings of the International Joint Conference on Artificial Intelligence, pp. 1884–1889 (2009)
Apsel, U., Brafman, R.: Exploiting uniform assignments in first-order mpe. In: Proceedings of UAI, pp. 74–83 (2012)
Asano, T.: An improved analysis of goemans and williamson’s lp-relaxation for max sat. Theoretical Computer Science 354(3), 339–353 (2006)
Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.G.: DBpedia: A nucleus for a web of open data. In: Aberer, K., et al. (eds.) ISWC/ASWC 2007. LNCS, vol. 4825, pp. 722–735. Springer, Heidelberg (2007)
Bengio, Y., LeCun, Y.: Scaling learning algorithms towards AI. In: Large Scale Kernel Machines. MIT Press (2007)
Berners-Lee, T.: Linked data – design issues (2006), http://www.w3.org/DesignIssues/LinkedData.html
Bhattacharya, I., Getoor, L.: Entity resolution in graphs. In: Mining Graph Data. Wiley & Sons (2006)
Bizer, C., Heath, T., Berners-Lee, T.: Linked data – the story so far. International Journal on Semantic Web and Information Systems (2012)
Bödi, R., Herr, K., Joswig, M.: Algorithms for highly symmetric linear and integer programs. Mathematical Programming 137(1-2), 65–90 (2013)
Bollacker, K., Evans, C., Paritosh, P., Sturge, T., Taylor, J.: Freebase: a collaboratively created graph database for structuring human knowledge. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, pp. 1247–1250 (2008)
Borgida, A.: On the relative expressiveness of description logics and predicate logics. Artificial Intelligence 82(1-2), 353–367 (1996)
Bui, H.H., Huynh, T.N., Riedel, S.: Automorphism groups of graphical models and lifted variational inference. CoRR, abs/1207.4814 (2012)
Carlson, A., Betteridge, J., Kisiel, B., Settles, B., Hruschka Jr., E.R., Mitchell, T.M.: Toward an architecture for never-ending language learning. In: Proceedings of the Twenty-Fourth Conference on Artificial Intelligence (AAAI 2010), pp. 1306–1313 (2010)
Chapelle, O., Schölkopf, B., Zien, A. (eds.): Semi-Supervised Learning. MIT Press, Cambridge (2006)
Costa, P.C.G., Laskey, K.B.: Pr-owl: A framework for probabilistic ontologies. In: Bennett, B., Fellbaum, C. (eds.) Proceedings of the International Conference on Formal Ontology in Information Systems (FOIS). Frontiers in Artificial Intelligence and Applications, pp. 237–249. IOS Press (2006)
Cruz, I.F., Stroe, C., Caci, M., Caimi, F., Palmonari, M., Antonelli, F.P., Keles, U.C.: Using AgreementMaker to Align Ontologies for OAEI 2010. In: Proceedings of the 5th Workshop on Ontology Matching (2010)
Cruz, I., Antonelli, F.P., Stroe, C.: Efficient selection of mappings and automatic quality-driven combination of matching methods. In: Proceedings of the ISWC 2009 Workshop on Ontology Matching (2009)
David, J., Guillet, F., Briand, H.: Matching directories and OWL ontologies with AROMA. In: Proceedings of the 15th Conference on Information and Knowledge Management (2006)
de Salvo Braz, R., Amir, E., Roth, D.: MPE and partial inversion in lifted probabilistic variable elimination. In: Proceedings of AAAI, pp. 1123–1130 (2006)
Diaconis, P.: Finite forms of de finetti’s theorem on exchangeability. Synthese 36(2), 271–281 (1977)
Ding, L., Kolari, P., Ding, Z., Avancha, S.: Bayesowl: Uncertainty modeling in semantic web ontologies. In: Ma, Z. (ed.) Soft Computing in Ontologies and Semantic Web. Springer (2006)
Domingos, P., Jain, D., Kok, S., Lowd, D., Poon, H., Richardson, M.: Alchemy website (2012), http://alchemy.cs.washington.edu/ (last visit: November 22, 2012)
Etzioni, O., Banko, M., Soderland, S., Weld, D.S.: Open information extraction from the web. Communications of the ACM 51(12), 68–74 (2008)
Etzioni, O., Fader, A., Christensen, J., Soderland, S., Mausam, M.: Open information extraction: the second generation. In: Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence, pp. 3–10 (2011)
Euzenat, J., Hollink, A.F.L., Joslyn, C., Malaisé, V., Meilicke, C., Pane, A.N.J., Scharffe, F., Shvaiko, P., Spiliopoulos, V., Stuckenschmidt, H., Sváb-Zamazal, O., Svátek, V., dos Santos, C.T., Vouros, G.: Results of the ontology alignment evaluation initiative 2009. In: Proceedings of the ISWC 2009 workshop on Ontology Matching (2009)
Euzenat, J., Shvaiko, P.: Ontology matching. Springer (2007)
Euzenat, J., et al.: First Results of the Ontology Alignment Evaluation Initiative 2010. In: Proceedings of the 5th Workshop on Ontology Matching (2010)
Fellbaum, C.: WordNet. Springer (2010)
Fellegi, I., Sunter, A.: A theory for record linkage. Journal of the American Statistical Association 64(328), 1183–1210 (1969)
Ferrara, A., Lorusso, D., Montanelli, S., Varese, G.: Towards a Benchmark for Instance Matching. In: The 7th International Semantic Web Conference (2008)
Finetti, B.D.: Probability, induction and statistics: the art of guessing. Probability and mathematical statistics. Wiley (1972)
Giugno, R., Lukasiewicz, T.: P-shoq(d): A probabilistic extension of shoq(d) for probabilistic ontologies in the semantic web. In: Flesca, S., Greco, S., Leone, N., Ianni, G. (eds.) JELIA 2002. LNCS (LNAI), vol. 2424, pp. 86–97. Springer, Heidelberg (2002)
Gogate, V., Domingos, P.: Probabilistic theorem proving. In: Proceedings of UAI, pp. 256–265 (2011)
Heinsohn, J.: A hybrid approach for modeling uncertainty in terminological logics. In: Kruse, R., Siegel, P. (eds.) ECSQAU 1991 and ECSQARU 1991. LNCS, vol. 548, pp. 198–205. Springer, Heidelberg (1991)
Hinton, G.E., Osindero, S., Teh, Y.W.: A fast learning algorithm for deep belief nets. Neural Computation 18(7), 1527–1554 (2006)
Hoffart, J., Suchanek, F.M., Berberich, K., Weikum, G.: Yago2: A spatially and temporally enhanced knowledge base from wikipedia. Artificial Intelligence 194, 28–61 (2013)
Holi, M., Hyvönen, E.: Modeling uncertainty in semantic web taxonomies. In: Ma, Z. (ed.) Soft Computing in Ontologies and Semantic Web. Springer (2006)
Hu, W., Chen, J., Cheng, G., Qu, Y.: ObjectCoref & Falcon-AO: Results for OAEI 2010. In: Proceedings of the 5th International Ontology Matching Workshop (2010)
Huynh, T.N., Mooney, R.J.: Max-margin weight learning for markov logic networks. In: Proceedings of EMCL PKDD, pp. 564–579 (2009)
Jaeger, M.: Probabilistic reasoning in terminological logics. In: Doyle, J., Sandewall, E., Torasso, P. (eds.) Proceedings of the 4th international Conference on Principles of Knowledge Representation and Reasoning, pp. 305–316. Morgan Kaufmann (1994)
Jean-Marya, Y.R., Patrick Shironoshitaa, E., Kabuka, M.R.: Ontology matching with semantic verification. Web Semantics 7(3) (2009)
Kautz, H., Selman, B., Jiang, Y.: A general stochastic approach to solving problems with hard and soft constraints. Satisfiability Problem: Theory and Applications 17 (1997)
Kersting, K., Ahmadi, B., Natarajan, S.: Counting belief propagation. In: Proceedings of UAI, pp. 277–284 (2009)
Kersting, K.: Lifted probabilistic inference. In: Proceedings of the 20th European Conference on Artificial Intelligence, pp. 33–38 (2012)
Kisynski, J., Poole, D.: Lifted aggregation in directed first-order probabilistic models. In: Proceedings of IJCAI, pp. 1922–1929 (2009)
Koller, D., Friedman, N.: Probabilistic Graphical Models: Principles and Techniques. MIT Press (2009)
Koller, D., Levy, A., Pfeffer, A.: P-classic: A tractable probabilistic description logic. In: Proceedings of the 14th AAAI Conference on Artificial Intelligence (AAAI 1997), pp. 390–397 (1997)
Lafferty, J.D., McCallum, A., Pereira, F.C.N.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: Proceedings of the Eighteenth International Conference on Machine Learning, pp. 282–289 (2001)
Laskey, K.B., Costa, P.C.G.: Of klingons and starships: Bayesian logic for the 23rd century. In: Proceedings of the 21st Conference in Uncertainty in Artificial Intelligence, pp. 346–353. AUAI Press (2005)
Levenshtein, V.I.: Binary codes capable of correcting deletions and insertions and reversals. In: Doklady Akademii Nauk SSSR, pp. 845–848 (1965)
Low, Y., Gonzalez, J., Kyrola, A., Bickson, D., Guestrin, C., Hellerstein, J.M.: Graphlab: A new framework for parallel machine learning. In: Proceedings of UAI, pp. 340–349 (2010)
Manola, F., Miller, E.: RDF primer. Technical report, WWW Consortium (February 2004), http://www.w3.org/TR/2004/REC-rdf-primer-20040210/
Margot, F.: Exploiting orbits in symmetric ilp. Math. Program. 98(1-3), 3–21 (2003)
Margot, F.: Symmetry in integer linear programming. In: 50 Years of Integer Programming 1958-2008, pp. 647–686. Springer, Heidelberg (2010)
Meilicke, C., Stuckenschmidt, H.: Analyzing mapping extraction approaches. In: Proceedings of the Workshop on Ontology Matching, Busan, Korea (2007)
Meilicke, C., Stuckenschmidt, H.: An efficient method for computing alignment diagnoses. In: Polleres, A., Swift, T. (eds.) RR 2009. LNCS, vol. 5837, pp. 182–196. Springer, Heidelberg (2009)
Meilicke, C., Tamilin, A., Stuckenschmidt, H.: Repairing ontology mappings. In: Proceedings of the Conference on Artificial Intelligence, Vancouver, Canada, pp. 1408–1413 (2007)
Melnik, S., Garcia-Molina, H., Rahm, E.: Similarity flooding: A versatile graph matching algorithm and its application to schema matching. In: Proceedings of ICDE, pp. 117–128 (2002)
Mendes, P.N., Jakob, M., Bizer, C.: Dbpedia: A multilingual cross-domain knowledge base. In: Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC), pp. 1813–1817 (2012)
Meza-Ruiz, I., Riedel, S.: Multilingual semantic role labelling with markov logic. In: Proceedings of the Conference on Computational Natural Language Learning, pp. 85–90 (2009)
Milch, B., Zettlemoyer, L.S., Kersting, K., Haimes, M., Kaelbling, L.P.: Lifted probabilistic inference with counting formulas. In: Proceedings of AAAI, pp. 1062–1068 (2008)
Mitchell, T.M., Betteridge, J., Carlson, A., Hruschka, E., Wang, R.: Populating the semantic web by macro-reading internet text. In: Bernstein, A., Karger, D.R., Heath, T., Feigenbaum, L., Maynard, D., Motta, E., Thirunarayan, K. (eds.) ISWC 2009. LNCS, vol. 5823, pp. 998–1002. Springer, Heidelberg (2009)
Mladenov, M., Ahmadi, B., Kersting, K.: Lifted linear programming. Journal of Machine Learning Research 22, 788–797 (2012)
Niepert, M.: A Delayed Column Generation Strategy for Exact k-Bounded MAP Inference in Markov Logic Networks. In: Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence (2010)
Niepert, M.: Markov chains on orbits of permutation groups. In: Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI), pp. 624–633 (2012)
Niepert, M.: Symmetry-aware maginal density estimation. In: Proceedings of the Conference on Artificial Intelligence (AAAI) (2013)
Niepert, M., Meilicke, C., Stuckenschmidt, H.: A Probabilistic-Logical Framework for Ontology Matching. In: Proceedings of the 24th AAAI Conference on Artificial Intelligence (2010)
Niepert, M., Meilicke, C., Stuckenschmidt, H.: Towards distributed mcmc inference in probabilistic knowledge bases. In: Proceedings of the Joint Workshop on Automatic Knowledge Base Construction and Web-scale Knowledge Extraction, pp. 1–6 (2012)
Niepert, M., Noessner, J., Meilicke, C., Stuckenschmidt, H.: Probabilistic-logical web data integration. In: Polleres, A., d’Amato, C., Arenas, M., Handschuh, S., Kroner, P., Ossowski, S., Patel-Schneider, P. (eds.) Reasoning Web 2011. LNCS, vol. 6848, pp. 504–533. Springer, Heidelberg (2011)
Niepert, M., Noessner, J., Stuckenschmidt, H.: Log-Linear Description Logics. In: Proceedings of the International Joint Conference on Artificial Intelligence (2011)
Niu, F., Ré, C., Doan, A.H., Shavlik, J.: Tuffy: Scaling up statistical inference in markov logic networks using an rdbms. Proceedings of the VLDB Endowment 4(6), 373–384 (2011)
Niu, F., Zhang, C., Ré, C., Shavlik, J.: Deepdive: Web-scale knowledge-base construction using statistical learning and inference. In: Second Int.l Workshop on Searching and Integrating New Web Data Sources (2012)
Noessner, J., Niepert, M., Stuckenschmidt, H.: Coherent top-k ontology alignment for OWL EL. In: Benferhat, S., Grant, J. (eds.) SUM 2011. LNCS, vol. 6929, pp. 415–427. Springer, Heidelberg (2011)
Noessner, J., Niepert, M., Stuckenschmidt, H.: RockIt: Exploiting Parallelism and Symmetry for MAP Inference in Statistical Relational Models. In: Proceedings of the Conference on Artificial Intelligence (AAAI) (2013)
Noessner, J., Niepert, M., Meilicke, C., Stuckenschmidt, H.: Leveraging Terminological Structure for Object Reconciliation. In: Aroyo, L., Antoniou, G., Hyvönen, E., ten Teije, A., Stuckenschmidt, H., Cabral, L., Tudorache, T. (eds.) ESWC 2010, Part II. LNCS, vol. 6089, pp. 334–348. Springer, Heidelberg (2010)
Ostrowski, J., Linderoth, J., Rossi, F., Smriglio, S.: Orbital branching. Math. Program. 126(1), 147–178 (2011)
Pan, R., Ding, Z., Yu, Y., Peng, Y.: A bayesian network approach to ontology mapping. In: Gil, Y., Motta, E., Benjamins, V.R., Musen, M.A. (eds.) ISWC 2005. LNCS, vol. 3729, pp. 563–577. Springer, Heidelberg (2005)
Poole, D.: First-order probabilistic inference. In: Proceedings of IJCAI, pp. 985–991 (2003)
Poon, H., Domingos, P.: Sum-product networks: A new deep architecture. In: Proceedings of the 27th Conference on Uncertainty in Artificial Intelligence, pp. 337–346 (2011)
Richardson, M., Domingos, P.: Markov logic networks. Machine Learning 62(1-2) (2006)
Riedel, S.: Improving the accuracy and efficiency of map inference for markov logic. In: Proceedings of the Conference on Uncertainty in Artificial Intelligence (2008)
Saïs, F., Pernelle, N., Rousset, M.-C.: Combining a logical and a numerical method for data reconciliation. In: Spaccapietra, S. (ed.) Journal on Data Semantics XII. LNCS, vol. 5480, pp. 66–94. Springer, Heidelberg (2009)
Schoenmackers, S., Etzioni, O., Weld, D.S., Davis, J.: Learning first-order horn clauses from web text. In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pp. 1088–1098 (2010)
Shavlik, J., Natarajan, S.: Speeding up inference in markov logic networks by preprocessing to reduce the size of the resulting grounded network. In: Proceedings of the 21st International Joint Conference on Artifical intelligence, pp. 1951–1956 (2009)
Singla, P., Domingos, P.: Lifted first-order belief propagation. In: Proceedings of AAAI, pp. 1094–1099 (2008)
Stoermer, H., Rassadko, N.: Results of OKKAM feature based entity matching algorithm for instance matching contest of OAEI 2009. In: Proceedings of the ISWC 2009 Workshop on Ontology Matching (2009)
Suchanek, F.M., Kasneci, G., Weikum, G.: Yago: a core of semantic knowledge. In: Proceedings of the 16th International Conference on World Wide Web, pp. 697–706 (2007)
Tsarkov, D., Riazanov, A., Bechhofer, S., Horrocks, I.: Using vampire to reason with OWL. In: McIlraith, S.A., Plexousakis, D., van Harmelen, F. (eds.) ISWC 2004. LNCS, vol. 3298, pp. 471–485. Springer, Heidelberg (2004)
Van den Broeck, G.: On the completeness of first-order knowledge compilation for lifted probabilistic inference. In: Proceedings of NIPS, pp. 1386–1394 (2011)
Venugopal, D., Gogate, V.: On lifting the gibbs sampling algorithm. In: Proceedings of Neural Information Processing Systems (NIPS), pp. 1664–1672 (2012)
Volz, J., Bizer, C., Gaedke, M., Kobilarov, G.: Silk - a link discovery framework for the web of data. In: Proceedings of the WWW 2009 Workshop on Linked Data on the Web (LDOW) (2009)
Wu, F., Weld, D.S.: Automatically refining the wikipedia infobox ontology. In: Proceeding of the International World Wide Web Conference, pp. 635–644 (2008)
Yang, Y., Calmet, J.: Ontobayes: An ontology-driven uncertainty model. In: Proceedings of the International Conference on Computational Intelligence for Modelling, Control and Automation and International Conference on Intelligent Agents, Web Technologies and Internet Commerce (CIMCA-IAWTIC 2005), pp. 457–463 (2005)
Yelland, P.M.: An alternative combination of bayesian networks and description logics. In: Cohn, A., Giunchiglia, F., Selman, B. (eds.) Proceedings of of the 7th International Conference on Knowledge Representation (KR 2000), pp. 225–234. Morgan Kaufman (2000)
Zhang, X., Zhong, Q., Shi, F., Li, J., Tang, J.: RiMOM results for OAEI 2009. In: Proceedings of the ISWC 2009 Workshop on Ontology Matching (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Niepert, M. (2013). Statistical Relational Data Integration for Information Extraction. In: Rudolph, S., Gottlob, G., Horrocks, I., van Harmelen, F. (eds) Reasoning Web. Semantic Technologies for Intelligent Data Access. Reasoning Web 2013. Lecture Notes in Computer Science, vol 8067. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-39784-4_7
Download citation
DOI: https://doi.org/10.1007/978-3-642-39784-4_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-39783-7
Online ISBN: 978-3-642-39784-4
eBook Packages: Computer ScienceComputer Science (R0)