Abstract
There is a close relationship between formal language theory and data compression. Since 1990’s various types of grammar-based text compression algorithms have been introduced. Given an input string, a grammar-based text compression algorithm constructs a context-free grammar that only generates the string. An interesting and challenging problem is pattern matching on context-free grammars \(\mathcal{P}\) of size m and \(\mathcal{T}\) of size n, which are the descriptions of pattern string P of length M and text string T of length N, respectively. The goal is to solve the problem in time proportional only to m and n, not to M nor N. Kieffer et al. introduced a very practical grammar-based compression method called multilevel pattern matching code (MPM code). In this paper, we propose an efficient pattern matching algorithm which, given two MPM grammars \(\mathcal{P}\) and \(\mathcal{T}\), performs in O(mn 2) time with O(mn) space. Our algorithm outperforms the previous best one by Miyazaki et al. which requires O(m 2 n 2) time and O(mn) space.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Bryant, R.E.: Symbolic boolean manipulation with ordered binary decision diagrams. ACM Computing Surveys 24, 293–318 (1992)
Charikar, M., Lehman, E., Liu, D., Panigrahy, R., Prabhakaran, M., Rasala, A., Sahai, A., Shelat, A.: Approximating the smallest grammar: Kolmogorov complexity in natural models. In: Proc. STOC 2002, pp. 792–801 (2002)
Crochemore, M., Rytter, W.: Text Algorithms. Oxford University Press, New York (1994)
Crochemore, M., Rytter, W.: Jewels of Stringology. World Scientific, Singapore (2002)
Gage, P.: A new algorithm for data compression. The C Users Journal 12(2) (1994)
Gusfield, D.: Algorithms on Strings, Trees, and Sequences. Cambridge University Press, New York (1997)
Inenaga, S., Shinohara, A., Takeda, M.: A fully compressed pattern matching algorithm for simple collage systems. In: Proc. PSC 2004, pp. 98–113. Czech Technical University (2004)
Karpinski, M., Rytter, W., Shinohara, A.: An efficient pattern-matching algorithm for strings with short descriptions. Nordic J. Comput. 4(2), 172–186 (1997)
Kieffer, J., Yang, E.: Grammar-based codes: a new class of universal lossless source codes. IEEE Transactions on Information Theory 46(3), 737–754 (2000)
Kieffer, J., Yang, E.: Grammar-based codes for universal lossless data compression. Communications in Information and Systems 2(2), 29–52 (2002)
Kieffer, J., Yang, E., Nelson, G., Cosman, P.: Universal lossless compression via multilevel pattern matching. IEEE Transactions on Information Theory 46(4), 1227–1245 (2000)
Larsson, J., Moffat, A.: Offline dictionary-based compression. In: Proc. DCC 1999, pp. 296–305. IEEE Computer Society Press, Los Alamitos (1999)
Miyazaki, M., Shinohara, A., Takeda, M.: An improved pattern matching algorithm for strings in terms of straight line programs. Journal of Discrete Algorithms 1(1), 187–204 (2000)
Nevill-Manning, C., Witten, I.: Compression and explanation using hierarchical grammars. Computer Journal 40(2/3), 103–116 (1997)
Nevill-Manning, C., Witten, I.: Identifying hierarchical structure in sequences: a linear-time algorithm. J. Artificial Intelligence Research 7, 67–82 (1997)
Nevill-Manning, C., Witten, I.: Inferring lexical and grammatical structure from sequences. In: Proc. DCC 1997, pp. 265–274. IEEE Computer Society Press, Los Alamitos (1997)
Nevill-Manning, C., Witten, I.: Phrase hierarchy inference and compression in bounded space. In: Proc. DCC 1998, pp. 179–188. IEEE Computer Society Press, Los Alamitos (1998)
Rytter, W.: Algorithms on compressed strings and arrays. In: Bartosek, M., Tel, G., Pavelka, J. (eds.) SOFSEM 1999. LNCS, vol. 1725, pp. 48–65. Springer, Heidelberg (1999)
Rytter, W.: Application of Lempel-Ziv factorization to the approximation of grammar-based compression. Theoretical Comput. Sci. 302(1–3), 211–222 (2003)
Woelfel, P.: Symbolic topological sorting with OBDDs. In: Rovan, B., Vojtáš, P. (eds.) MFCS 2003. LNCS, vol. 2747, pp. 671–680. Springer, Heidelberg (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Inenaga, S., Shinohara, A., Takeda, M. (2004). An Efficient Pattern Matching Algorithm on a Subclass of Context Free Grammars. In: Calude, C.S., Calude, E., Dinneen, M.J. (eds) Developments in Language Theory. DLT 2004. Lecture Notes in Computer Science, vol 3340. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30550-7_19
Download citation
DOI: https://doi.org/10.1007/978-3-540-30550-7_19
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-24014-3
Online ISBN: 978-3-540-30550-7
eBook Packages: Computer ScienceComputer Science (R0)