Skip to main content

Part of the book series: Text, Speech and Language Technology ((TLTB,volume 9))

  • 258 Accesses

Abstract

As we have seen in the previous chapter, we can use rules to select an appropriate tag for each token. We will continue investigating the use of rules in this chapter. However, where in the previous chapter the rules were created manually, based on someone’s linguistic knowledge and familiarity with properties of the corpus, we will explore the possibility of learning tagging rules automatically. A potential advantage of automatic rule learning is that such a system could in theory be highly portable, both across domains and across languages. If training material is available, the systems can be retrained with little or no human intervention. A limitation of this approach is that such systems can only learn facts that can be described within the prespecified descriptive language of the learner, which limits the types of rules that can be learned. For example, a person might discover that a word tends to be tagged with one particular tag when it is toward the end of a sentence. If the learner did not have access to the concept of sentence length and position in a sentence, discovering such a heuristic rule would be beyond the capability of the learning algorithm. One thing that differentiates this approach from other machine learning approaches such as training neural networks (cf. Chapter 17) or hidden Markov models (HMMs; cf. Chapter 16) is that the learned information will be in a form suitable for people to understand, edit, improve, etc., just as is the case for manually written rules.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1999 Springer Science+Business Media Dordrecht

About this chapter

Cite this chapter

Brill, E. (1999). Corpus-Based Rules. In: van Halteren, H. (eds) Syntactic Wordclass Tagging. Text, Speech and Language Technology, vol 9. Springer, Dordrecht. https://doi.org/10.1007/978-94-015-9273-4_15

Download citation

  • DOI: https://doi.org/10.1007/978-94-015-9273-4_15

  • Publisher Name: Springer, Dordrecht

  • Print ISBN: 978-90-481-5296-4

  • Online ISBN: 978-94-015-9273-4

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics