Abstract
We present a novel way of extracting a categorial grammar from annotated data. Using the sentences from the Paris VII annotated treebank [2] as our starting point, we use a tree transducer to convert the annotated trees from the corpus into categorial grammar derivations.
We describe both the formal aspects and the implementation of the tree transducer, which is a conservative extension of standard tree transducers allowing a compact specification of the transductions rules relevant for our purposes, and we discuss the specific set of transduction rules we use to convert the corpus into AB grammar derivation trees.
Evaluating the resulting tree transducer on the entire corpus, we find that it produces a treebank finds lexical entries for 90,0% of the corpus, though it produces complete derivations for only 75% of all sentence in the corpus.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Abeillé, A., Clément, L.: Annotation morpho-syntaxique (2003), http://llf.linguist.jussieu.fr
Abeillé, A., Clément, L., Toussenel, F.: Building a treebank for french. Treebanks. Kluwer, Dordrecht (2003)
Besombes, J., Marion, J.: Learning tree languages from positive examples and membership queries. In: Ben-David, S., Case, J., Maruoka, A. (eds.) ALT 2004. LNCS (LNAI), vol. 3244, pp. 440–453. Springer, Heidelberg (2004)
BuszKowski, W., Penn, G.: Categorial grammars determined from linguistic data by unification. Studia Logica 49(4), 431–454 (1990), http://dx.doi.org/10.1007/BF00370157
Chomsky, N.: Lectures on government and binding (1981)
Clark, S., Curran, J.: Wide-coverage efficient statistical parsing with ccg and log-linear. Models, Computational Linguistics 33 (2007)
Comon, H., Dauchet, M., Jacquemard, F., Lugiez, D., Tison, S., Tommasi, M.: Tree automata techniques and applications (1997), http://citeseerx.ist.psu.edu/viewdoc/summary?doi=?doi=10.1.1.125.6165
Costa Florêncio, C.: Consistent identification in the limit of any of the classes k-valued is NP-hard. In: de Groote, P., Morrill, G., Retoré, C. (eds.) LACL 2001. LNCS (LNAI), vol. 2099, pp. 125–138. Springer, Heidelberg (2001)
Engelfriet, J., Vogler, H.: The translation power of top-down tree-to-graph transducers. Journal of Computer and System Sciences 49(2) (1993)
Gold, E.M.: Language identification in the limit. Information and Control 10(5) (1967)
Hockenmaier, J.: Data and models for statistical parsing with combinatory categorial grammar (2003)
Hockenmaier, J.: Creating a ccgbank and a wide-coverage ccg lexicon for german. In: Proceedings of COLING/ACL, Sydney (2006)
Kanazawa, M.: Learnable Classes of Categorial Grammars. Center for the Study of Language and Information, Stanford University, Ventura Hall, 220 Panama Street, Stanford, CA 94305-4115 (1998), phone: 650-723-3084; e-mail: pubs@csli.stanford.edu; World Wide Web: http://csli-www.stanford.edu/publications/
Knight, K., Graehl, J.: An overview of probabilistic tree transducers for natural language processing. In: Gelbukh, A. (ed.) CICLing 2005. LNCS, vol. 3406, pp. 1–24. Springer, Heidelberg (2005)
Kraak, E.: A deductive account of french object clitics. In: SYntax and Semantics, pp. 271–312 (1998)
Lambek, J.: The mathematics of sentence structure. The American Mathematical Monthly 65(3), 154–170 (1958), http://www.jstor.org/stable/2310058 , articletype: primary_article / Full publication date: March 1958, Mathematical Association of America
Levy, R., Andrew, G.: Tregex and tsurgeon: tools for querying and manipulating tree data structures (2006), http://nlp.stanford.edu/software/tregex.shtml
Moortgat, M.: Categorial type logics. In: Handbook of Logic and Language, pp. 93–177 (1997), http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.36.5803
Moot, R.: Automated extraction of type-logical supertags from the spoken dutch corpus. In: Complexity of Lexical Descriptions and its Relevance to Natural Language Processing: A Supertagging Approach (2010)
Moot, R.: Semi-automated extraction of a wide-coverage type-logical grammar for french. In: Proceedings TALN 2010, Monreal (2010)
Moot, R., Retoré, C.: Les indices pronominaux du français dans les grammaires catégorielles. Lingvisticae Investigationes 29(1), 137–146 (2006)
Morrill, G.V.: Type Logical Grammar: Categorial Logic of Signs. Springer, Heidelberg (1994)
Sandillon-Rezer, N. (2011), http://www.labri.fr/perso/nfsr/
Steedman, M.: The syntactic process (200)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Sandillon-Rezer, NF., Moot, R. (2011). Using Tree Transducers for Grammatical Inference. In: Pogodalla, S., Prost, JP. (eds) Logical Aspects of Computational Linguistics. LACL 2011. Lecture Notes in Computer Science(), vol 6736. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-22221-4_16
Download citation
DOI: https://doi.org/10.1007/978-3-642-22221-4_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-22220-7
Online ISBN: 978-3-642-22221-4
eBook Packages: Computer ScienceComputer Science (R0)