Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

\(\mathbb {K}\)[11] is a framework for formally defining the semantics of programming languages. The current version of \(\mathbb {K}\)includes options that have Maude [3] as a backend: the \(\mathbb {K}\)compiler transforms any \(\mathbb {K}\)definition into a Maude module; then, the \(\mathbb {K}\)runner uses Maude to run or analyze programs in the defined language.

Recently, \(\mathbb {K}\)has been extended with symbolic execution support [2]. Briefly, a \(\mathbb {K}\)language definition is automatically transformed into a symbolic-language definition, such that the concrete executions of programs using the symbolic definition are symbolic executions of programs using the original language definition. The transformation amounts to incorporating path conditions in program configurations, and to changing the language’s semantic rules so that they match on symbolic configurations and that they automatically update the path conditions.

Symbolic executions are called feasible if their path conditions are satisfiable. Two results relating concrete and symbolic program executions are proved in [2]: coverage, saying that for each concrete execution there is a feasible symbolic one taking the same path on the program; and precision, saying that for each feasible symbolic execution there is a concrete one taking the same program path.

In this paper we propose two ways of representing \(\mathbb {K}\)language definitions in Maude: a faithful representation and an approximate one. We then study the relationships between \(\mathbb {K}\)language definitions (including the symbolic ones, obtained by the above-described transformation) and their representations in Maude. We also show how the coverage and precision results, which relate a language \(\mathcal {L}\) and its symbolic extension \({\mathcal {L}}^{\mathfrak {s}}\), are reflected on their respective representations in Maude. These results show, in particular, how (symbolic) analyses performed with Maude tools on the (faithful and approximate) Maude representations of languages can be lifted up to the original language definitions. The various results that we have obtained can be graphically depicted as in following diagram (dashed arrows show the results proved in the paper):

figure a

In the faithful encoding, each semantic rule of the language definition \(\mathcal {L}\) is translated into a rewrite rule of the rewrite theory \(\mathcal {R}(\mathcal {L})\). Equations are only introduced in order to express equality in the data domain. The resulting rewrite theory is proved to be executable by Maude, and the transition system generated by the language definition is shown to be isomorphic to the one generated by the rewrite theory. Some variations of this encoding are also discussed, all of which satisfy the executability and faithfulness properties. As a consequence, both positive and negative results of reachability analyses, obtained on rewrite theories (i.e., by using the Maude search command) also hold on the original language definitions. Moreover, all symbolic reachability analysis results obtained on the rewrite-theory representation \(\mathcal {R}({\mathcal {L}}^{\mathfrak {s}})\) of a symbolic language \({\mathcal {L}}^{\mathfrak {s}}\) also hold on the rewrite-theory representation \(\mathcal {R}(\mathcal {L})\) of the language \(\mathcal {L}\). The latter property is analoguous to the results obtained in [10], where rewriting modulo SMT is shown to be related to (usual) rewriting in a sound and complete way.

For nontrivial language definitions, the faithful encoding is not very practical, because it typically generates a huge state-space that is not amenable to reachability analysis. This is why we introduce approximate representations of language definitions as two-layered rewrite theories. These approximations are obtained by splitting the semantic rules of the language into two sets, called layers, such that the first layer forms a terminating rewrite system. The one-step rewriting in such a theory is obtained by computing an irreducible form w.r.t. rules from the first layer (according to a given strategy), and then applying a rule from the second layer. A simple example of a two-layered rewrite theory is a Maude module consisting of equations and rules, where the equations (denoting the first layer) are only required to be terminating, and both the equations and rules (which form the second layer) specify transitions in the underlying transition-system model of the theory.

In an (approximating) two-layered rewrite theory \(\mathfrak {R}(\mathcal {L})\), only a subset of the executions of programs in the original language \(\mathcal {L}\) are represented. The consequence is that only positive results of reachability analyses on the two-layered rewrite theories can be lifted up to the corresponding language definitions. In addition to reducing the state-space to be explored, the approximate encoding of a language by a two-layered rewrite theory can also be seen as the output of a compiler that solves some semantic choices left by the language definition at compile-time. For example, in C, the order in which the operands of addition are evaluated is a compile-time choice. By turning the operand-evaluation rules into first-layer rules, and by letting Maude automatically execute these rules in various orders according to certain strategies, one can reproduce the various design compile-time choices for the evaluation of arguments.

We note that approximating two-layered rewrite theories have some limitations: only the coverage property relating the language definition \(\mathcal {L}\) to its symbolic version \({\mathcal {L}}^{\mathfrak {s}}\) also holds on their respective approximate encodings theories; the precision property holds only in some restricted cases. However, the precision property between the approximate symbolic encoding \(\mathfrak {R}({\mathcal {L}}^{\mathfrak {s}})\) and the language definition \(\mathcal {L}\) always holds. Hence, one can trace symbolic reachability analyses (performed on \(\mathfrak {R}({\mathcal {L}}^{\mathfrak {s}})\)) back to programs in \(\mathcal {L}\), and also (in some restricted cases) to the representation of programs in \(\mathfrak {R}(\mathcal {L})\), which, as discussed above, can be seen as compiled programs where some semantic choices are left to the compiler.

Organisation. In Sect. 2 we present our working examples, which are two programs belonging to the CinK kernel of , which was specified in \(\mathbb {K}\)[7]. A partial description of the \(\mathbb {K}\)definition for CinK is included. In Sect. 3 we introduce a formal notion of a language-definition framework, which allows us to make our approach independent of the \(\mathbb {K}\)language definitional framework and to abstract away some particular implementation details of \(\mathbb {K}\). For the same reason, we will be using rewrite theories (instead of their implementations as Maude modules) for the encodings of language definitions. We also briefly present the language-independent symbolic execution approach [2] and recap some essential notions related to the executability of rewrite theories.

Section 4 presents the faithful and the approximate representations of language definitions into a rewrite theory and the various relations between them (graphically depicted in the above diagram). Section 5 presents the applications of these representations to the compilation of \(\mathbb {K}\)language definitions as Maude modules. Finally, Sect. 6 presents conclusions and related work.

2 Running Example

Our running example is CinK  [7], a kernel of the programming language. The \(\mathbb {K}\)definition of CinK can be found on the \(\mathbb {K}\)Framework Github repository: http://github.com/kframework/cink-semantics. As any \(\mathbb {K}\)definition, it consists of the language syntax, given using a BNF-style grammar, and of its semantics, given using rewrite rules on configurations. In this paper we only exhibit a small part of the \(\mathbb {K}\)definition of CinK, whose syntax is shown in Fig. 1. Some of the grammar productions are annotated with \(\mathbb {K}\)-specific attributes.

Fig. 1.
figure 1

CinK syntax

Fig. 2.
figure 2

CinK configuration

A major feature of expressions is that given by the “sequenced before” relation [1], which defines a partial order over the evaluation of subexpressions. This can be easily expressed in \(\mathbb {K}\)using the strict attribute to specify an evaluation order for an operation’s operands. If the operator is annotated with the strict attribute then its operands will be evaluated in a nondeterministic order. For instance, all the binary operations are strict. Hence, they may induce non-determinism in programs because of possible side-effects in their arguments.

Another feature is given by the classification of expressions into rvalues and lvalues. The arguments of binary operations are evaluated as rvalues and their results are also rvalues, while, e.g., both the argument of the prefix-increment operation and its result are lvalues. The strict attribute for such operations has a sub-attribute context for wrapping any subexpression that must be evaluated as an rvalue. Other attributes (\( funcall, divide, plus, minus, \dots \)) are names associated to each syntactic production, which can be used for referring to them.

The \(\mathbb {K}\)framework uses configurations to store program states. A configuration is a nested structure of cells, which typically include the program to be executed, input and output streams, values for program variables, and other additional information. The configuration of CinK (Fig. 2) includes the \({\langle \rangle }{}_\mathsf{k}\) cell containing the code that remains to be executed, which is represented as a list of computation tasks \(C_1\curvearrowright C_2\curvearrowright \ldots \) to be executed in the given order. Computation tasks are typically statements and expression evaluations. The memory is modeled using two cells \({\langle \rangle }{}_\mathsf{env}\) (which holds a map from variables to addresses) and \({\langle \rangle }{}_\mathsf{state}\) (which holds a map from addresses to values). The configuration also includes a cell for the function call stack and another one for the return values of functions.

When the configuration is initialised at runtime, a CinK program is loaded in the \({\langle \rangle }{}_\mathsf{k}\) cell, and all the other cells remain empty. A \(\mathbb {K}\) rule is a topmost rewrite rule specifying transitions between configurations. Since usually only a small part of the configuration is changed by a rule, a configuration abstraction mechanism is used, allowing one to only specify the parts transformed by the rule. For instance, the (abstract) rule for addition, shown in Fig. 3, represents the (concrete) rule

$$\begin{aligned}&{\langle {\langle {I_1\mathtt + I_2\curvearrowright C}\rangle }{}_\mathsf{k} {\langle {E}\rangle }{}_\mathsf{env}{\langle {S}\rangle }{}_\mathsf{store}{\langle {T}\rangle }{}_\mathsf{stack}{\langle {V}\rangle }{}_\mathsf{return}\rangle }{}_\mathsf{cfg}\\&{\varvec{\Rightarrow }}\\&{\langle {\langle {I_1+_{{Int}}I_2\curvearrowright C}\rangle }{}_\mathsf{k} {\langle {E}\rangle }{}_\mathsf{env}{\langle {S}\rangle }{}_\mathsf{store}{\langle {T}\rangle }{}_\mathsf{stack}{\langle {V}\rangle }{}_\mathsf{return}\rangle }{}_\mathsf{cfg} \end{aligned}$$
Fig. 3.
figure 3

Subset of rules from the K semantics of CinK

where \(\mathtt{+ }_{Int}\) is the mathematical operation for addition. Note that the ellipses in a cell (e.g., \({\langle {\;}{\cdot }{\cdot }{\cdot } \rangle }{}_\mathsf{k}\)) represent the part of the cell not affected by the rule.

The rule for division has a side condition which restricts its application. The conditional statement \(\mathtt{if }\) has two corresponding rules, one for each possible evaluation of the condition expression. The rule for the \(\mathtt{while }\) loop is unrolled into an \(\mathtt{if }\) statement. The increment and update rules have side effects in the \({\langle \rangle }{}_\mathsf{store}\) cell, modifying the value stored at a specific address. Finally, the reading of a value from the memory is specified by the lookup rule, which matches a value in the \({\langle \rangle }{}_\mathsf{store}\) and places it in the \({\langle \rangle }{}_\mathsf{k}\) cell. The auxiliary construct \(\mathtt{{\$lookup} }\) is used, e.g., when a program variable is evaluated as an rvalue.

In addition to these rules (writtten by the \(\mathbb {K}\)user), the \(\mathbb {K}\)framework automatically generates so-called heating and cooling rules, which are induced by strict attributes. We show only the case of division, which is strict in both arguments:

figure b

where \(\square \) is a special symbol, destined to receive the result of an evaluation.

We shall be using the following two programs in the sequel. The program counter in Fig. 4 is nondeterministic; nondeterminism arises from the undefined evaluation order for the arguments of the + operation and from the side-effects in its arguments. The program log in the same figure is a symbolic one because A:Int is a symbolic value, which can denote any integer value. When it is completed the variable k holds \([\log _2(A)]\) where [_] denotes the integer part of a real number. In Sect. 5 we show how the behaviours of these programs can be analysed using our encodings of the CinK language as Maude programs.

Fig. 4.
figure 4

Two C++ programs

3 Background

3.1 The Ingredients of a Language Definition

In this section we identify the ingredients of language definitions in an algebraic and term-rewriting setting. The concepts are explained on the \(\mathbb {K}\)definition of CinK. We assume the reader is familiar with the basics of algebraic specification and rewriting. A language \(\mathcal {L}\) can be defined as a triple \((\varSigma , \mathcal {T}, \mathcal {S})\), consisting of:

  1. 1.

    A many-sorted algebraic signature \(\varSigma \), which includes at least a sort \( Cfg \) for configurations and a sort \( Bool \) for constraint formulas. For the sake of presentation, we assume in this paper that the constraint formulas are Boolean terms built with a subsignature \(\varSigma ^{\mathsf {Bool}} \subseteq \varSigma \) including the boolean constants and operations. \(\varSigma \) may also include other subsignatures for other data sorts, depending on the language \(\mathcal {L}\) (e.g., integers, identifiers, lists, maps,...). Let \(\varSigma ^\mathsf {Data}\) denote the subsignature of \(\varSigma \) consisting of all data sorts and their operations. We assume that the sort \( Cfg \) and the syntax of \(\mathcal {L}\) are not data, i.e., they are defined in \(\varSigma \setminus \varSigma ^\mathsf {Data}\). Let \(T_\varSigma \) denote the \(\varSigma \)-algebra of ground terms and \(T_{\varSigma ,s}\) denote the set of ground terms of sort \(s\). Given a sort-wise infinite set of variables \( Var \), let \(T_\varSigma ( Var )\) denote the free \(\varSigma \)-algebra of terms with variables, \(T_{\varSigma ,s}( Var )\) denote the set of terms of sort \(s\) with variables, and \( var (t)\) denote the set of variables occurring in the term \(t\).

  2. 2.

    A \(\varSigma ^\mathsf {Data}\)-model \(\mathcal {D}\), which interprets the data sorts and operations. For convenience, we assume that \(\mathcal {D}_d\subset \varSigma _{d}\) for each data sort \(d\), i.e., the constants are elements of the corresponding signature. Let \(\mathcal {T}\triangleq \mathcal {T}(\mathcal {D})\) denote the free \(\varSigma \)-model generated by \(\mathcal {D}\). The satisfaction relation \(\rho \;\models \;b\) between valuations \(\rho \) and constraint formulas \(b\in T_{\varSigma , Bool }( Var )\) is defined by \(\rho \;\models \; b\) iff \(\rho (b)= {\mathcal {D}}_{{ true }}\). For simplicity, we write \({ true },{ false }, 0, 1\ldots \) instead of \({\mathcal {D}}_{{ true }}, {\mathcal {D}}_{{ false }}, {\mathcal {D}}_0, {\mathcal {D}}_1, \ldots \).

  3. 3.

    A set \(\mathcal {S}\) of rewrite rules. Each rule is a pair of the form \({l}\pmb {\wedge }{b}\;\pmb {\Rightarrow }\;{r} \), where \(l,r\in T_{\varSigma , Cfg }( Var )\) are the rule’s left-hand-side and right-hand-side, respectively, and \(b\in T_{\varSigma , Bool }( Var )\) is the condition. The formal definitions for rules and for the transition system defined by them are given below.

Remark 1

For the sake of presentation, here we consider only “pure” language definitions, where the semantics is given only by semantic rules between configurations. Some definitions may include additional functions defined by equations. For such cases the language definition may additionally includes a set of axioms \(A_0\), e.g., associativity and/or commutativity of some functions, and a set of equations \(E_0\). Then the model \(\mathcal {T}\) is the free algebra modulo \(A_0\cup E_0\). We believe that the approach presented in this paper can be extended to these more involved definitions, but this requires more investigation and is left for future work.

We now formally introduce the notions required for defining semantic rules.

Definition 1

(pattern [12]). A pattern is an expression of the form \({\pi }\;\pmb {\wedge }\;{b}\), where \(\pi \in T_{\varSigma , Cfg }( Var )\) is a basic pattern and \(b\in T_{\varSigma , Bool }( Var )\). If \(\gamma \in T_ Cfg \) and \(\rho \,{:} Var \rightarrow \mathcal {T}\) then we write \((\gamma ,\rho )\;\models \;{\pi }\;\pmb {\wedge }\;{b}\) iff \(\gamma =\rho (\pi )\) and \(\rho \;\models \; b\).

A basic pattern \(\pi \) defines a set of (concrete) configurations, and the condition \(b\) gives additional constraints these configurations must satisfy.

Remark 2

The above definition is a particular case of a definition in [12]. There, a pattern is a first-order logic formula with configuration terms as sub-formulas. In this paper we keep the conjunction notation from first-order logic but separate basic patterns from constraints. Note that first-order formulas can be encoded as terms of sort \(Bool\), where the quantifiers become constructors. The satisfaction relation \(\models \) is then defined, for such terms, like the usual FOL satisfaction.

We identify basic patterns \(\pi \) with patterns \({\pi }\;\pmb {\wedge }\;{{ true }}\). Sample patterns are and .

Definition 2

(rule, transition system). A rule is a pair of patterns of the form \({l}\pmb {\wedge }{b}\;\pmb {\Rightarrow }\;{r} \) (note that \(r\) is in fact the pattern \({r}\;\pmb {\wedge }\;{{ true }}\)). Any set \(\mathcal {S}\) of rules defines a labelled transition system \((\mathcal {T}_ Cfg , \Rightarrow _{\mathcal {S}})\) such that \(\gamma \mathop {\Longrightarrow }\limits ^{\alpha }\mathop {_{\mathcal {S}}}\limits ^{}\gamma '\) iff there exist \(\alpha \triangleq ({l}\pmb {\wedge }{b}\;\pmb {\Rightarrow }\;{r} ) \in \mathcal {S}\) and \(\rho : Var \rightarrow \mathcal {T}\) such that \((\gamma ,\rho )\;\models \; {l}\;\pmb {\wedge }\;{b}\) and \((\gamma ',\rho )\;\models \;r\).

3.2 Symbolic Execution

We briefly recap our approach to symbolic execution from [2]. The main idea is to automatically generate a new definition \(({\varSigma }^{\mathfrak {s}},{\mathcal {T}}^{\mathfrak {s}},{\mathcal {S}}^{\mathfrak {s}})\) for a language \({\mathcal {L}}^{\mathfrak {s}}\) from a given definition \((\varSigma , \mathcal {T}, \mathcal {S})\) of a language \(\mathcal {L}\). The new language \({\mathcal {L}}^{\mathfrak {s}}\) has the same syntax, and its semantics extends \(\mathcal {L}\)’s data domains with symbolic values and adapts the semantical rules of \(\mathcal {L}\) to deal with the new domains.

Let \({V}^{\mathfrak {s}}\) denote an infinite, data sort-wise set of symbolic values, disjoint from \( Var \) and from symbols in \(\varSigma \). The data algebra is extended to \({\mathcal {D}}^{\mathfrak {s}}\), which is the algebra of ground terms over the signature \(\varSigma ^\mathsf {Data}({V}^{\mathfrak {s}})\).

Remark 3

The approach in [2] allows some freedom in choosing the algebra \({\mathcal {D}}^{\mathfrak {s}}\), to enable the use of decision procedures for handling symbolic artifacts.

The signature \({\varSigma }^{\mathfrak {s}}\) extends \(\varSigma \) with the symbolic values \({V}^{\mathfrak {s}}\) as constants, a new sort \({ Cfg }^{\mathfrak {s}}\) and a constructor \({\_}\;\pmb {\wedge }\;{\_}: Cfg \times Bool \rightarrow { Cfg }^{\mathfrak {s}}\). The model \({\mathcal {T}}^{\mathfrak {s}}\) is defined as being the free \({\varSigma }^{\mathfrak {s}}\)-model generated by \({\mathcal {D}}^{\mathfrak {s}}\), similarly to how \(\mathcal {T}\) is built over \(\mathcal {D}\). The ground terms \({\pi }\;\pmb {\wedge }\;{\phi }\in {\mathcal {T}}^{\mathfrak {s}}_{{ Cfg }^{\mathfrak {s}}}\) are called symbolic configurations. Let \([\![{\pi }\;\pmb {\wedge }\;{\phi }]\!]\) denote the set of concrete configurations \(\{\gamma \mid (\exists \rho )\,(\gamma ,\rho )\;\models \; {\pi }\;\pmb {\wedge }\;{\phi }\}\).

Thanks to the rule transformation procedure presented in [2], we make without loss of generality the assumption that the basic patterns in left-hand sides of rules do not contain operations on data, and the rules are left-linear. Concrete semantic rules \({{l}\;\pmb {\wedge }\;{b}}\;\pmb {\Rightarrow }\;{r} \in \mathcal {S}\) are then systematically transformed into rules

$$\begin{aligned} {{ l}\;\pmb {\wedge }\;{\psi }}\;\pmb {\Rightarrow }\;{{r}\;\pmb {\wedge }\;{(\psi \wedge b)}} \end{aligned}$$
(5)

where \(\psi \in Var \) is a fresh variable of sort \(Bool\) playing the role of a path condition. This means that symbolic rules are applied like concrete rules, except for the fact that the current path condition \(\psi \) is enriched with the rule’s condition \(b\).

Then, the symbolic execution of \({\mathcal {L}}\) programs is the concrete execution of the corresponding \({\mathcal {L}}^{\mathfrak {s}}\) programs, i.e., the application of the rewrite rules in the semantics of \({\mathcal {L}}^{\mathfrak {s}}\). Building the definition of \({\mathcal {L}}^{\mathfrak {s}}\) amounts to extending the signature \(\varSigma \) to a symbolic signature \({\varSigma }^{\mathfrak {s}}\), extending the \(\varSigma \)-algebra \(\mathcal {T}\) to a \({\varSigma }^{\mathfrak {s}}\)-algebra \({\mathcal {T}}^{\mathfrak {s}}\), and turning the concrete rules \(\mathcal {S}\) into symbolic rules \({\mathcal {S}}^{\mathfrak {s}}\). The transition system \(({\mathcal {T}}^{\mathfrak {s}}_{{ Cfg }^{\mathfrak {s}}},\Rightarrow _{{\mathcal {S}}^{\mathfrak {s}}})\) is defined using Definitions 12 applied to \({\mathcal {L}}^{\mathfrak {s}}\). In [2] it is proved that the symbolic transition system forward-simulates the concrete one, and that the concrete transition system backward-simulates the symbolic one. These two results then imply the naturally expected properties of symbolic execution.

Theorem 1

(Coverage [2]). For every concrete execution \(\gamma _0 \mathop {\Longrightarrow }\limits ^{\alpha _1}\mathop {_{\mathcal {S}}}\limits ^{} \gamma _1 \mathop {\Longrightarrow }\limits ^{\alpha _2}\mathop {_{\mathcal {S}}}\limits ^{} \cdots \mathop {\Longrightarrow }\limits ^{\alpha _n}\mathop {_{\mathcal {S}}}\limits ^{} \gamma _n \mathop {\Longrightarrow }\limits ^{\alpha _{n+1}}\mathop {_{\mathcal {S}}}\limits ^{} \cdots \) there is a symbolic execution \({\pi _0}\;\pmb {\wedge }\;{\phi _0} \mathop {\Longrightarrow }\limits ^{\alpha _1}\mathop {_{{\mathcal {S}}^{\mathfrak {s}}}}\limits ^{} {\pi _1}\;\pmb {\wedge }\;{\phi _1} \mathop {\Longrightarrow }\limits ^{\alpha _2}\mathop {_{{\mathcal {S}}^{\mathfrak {s}}}}\limits ^{} \cdots \mathop {\Longrightarrow }\limits ^{\alpha _n}\mathop {_{{\mathcal {S}}^{\mathfrak {s}}}}\limits ^{} {\pi _n}\;\pmb {\wedge }\;{\phi _n} \mathop {\Longrightarrow }\limits ^{\alpha _{n+1}}\mathop {_{{\mathcal {S}}^{\mathfrak {s}}}}\limits ^{} \cdots \) such that \(\gamma _i \in [\![{\pi _i}\;\pmb {\wedge }\;{\phi _i}]\!]\) for \(i = 0, 1, \ldots \).

A symbolic configuration \({\pi }\;\pmb {\wedge }\;{\phi }\in {\mathcal {T}}^{\mathfrak {s}}_{{ Cfg }^{\mathfrak {s}}}\) is satisfiable if there is a valuation \(\vartheta :{V}^{\mathfrak {s}}\rightarrow \mathcal {D}\) such that \(\vartheta \;\models \; \phi \) (which is equivalent to \([\![{\pi }\;\pmb {\wedge }\;{\phi }]\!]\not =\emptyset \)). We call a symbolic execution feasible if all its configurations are satisfiable.

Theorem 2

(Precision [2]). For every feasible symbolic execution \({\pi _0}\;\pmb {\wedge }\;{\phi _0} \mathop {\Longrightarrow }\limits ^{\alpha _1}\mathop {_{{\mathcal {S}}^{\mathfrak {s}}}}\limits ^{} {\pi _1}\;\pmb {\wedge }\;{\phi _1} \mathop {\Longrightarrow }\limits ^{\alpha _2}\mathop {_{{\mathcal {S}}^{\mathfrak {s}}}}\limits ^{} \cdots \mathop {\Longrightarrow }\limits ^{\alpha _n}\mathop {_{{\mathcal {S}}^{\mathfrak {s}}}}\limits ^{} {\pi _n}\;\pmb {\wedge }\;{\phi _n} \mathop {\Longrightarrow }\limits ^{\alpha _{n+1}}\mathop {_{{\mathcal {S}}^{\mathfrak {s}}}}\limits ^{} \cdots \) there is a concrete execution \(\gamma _0 \mathop {\Longrightarrow }\limits ^{\alpha _1}\mathop {_{\mathcal {S}}}\limits ^{} \gamma _1 \mathop {\Longrightarrow }\limits ^{\alpha _2}\mathop {_{\mathcal {S}}}\limits ^{} \cdots \mathop {\Longrightarrow }\limits ^{\alpha _n}\mathop {_{\mathcal {S}}}\limits ^{} \gamma _n \mathop {\Longrightarrow }\limits ^{\alpha _{n+1}}\mathop {_{\mathcal {S}}}\limits ^{} \cdots \) such that \(\gamma _i \in [\![{\pi _i}\;\pmb {\wedge }\;{\phi _i}]\!]\) for \(i = 0, 1, \ldots \).

3.3 Rewrite Theories

A rewrite theory [3] \(\mathcal{R}=(\varSigma ,E\cup A,R)\) consists of a signature \(\varSigma \), a set of equations \(E\), a set of axioms \(A\), e.g., associativity, commutativity, unity or combinations of these, and a set of rewrite rules \(R\) of the form \(l\rightarrow r~\mathbf{if}~b\), where \(l\) and \(r\) are terms with variables and \(b\) is a term of sort Bool. We are only interested in rewrite theories \(\mathcal R\) that are executable, i.e., \((\varSigma ,E\cup A,R)\) where:

  1. 1.

    there exists a matching algorithm modulo \(A\);

  2. 2.

    \((\varSigma ,E\cup A)\) is ground Church-Rosser and terminating modulo \(A\) (the equations \(E\) are seen here as rewrite rules oriented from left to right). Thus, each ground term \(t\) has a canonical form \( can _{E/A}(t)\) that is unique modulo the axioms \(A\);

  3. 3.

    \(R\) is ground coherent w.r.t. \(E\) modulo \(A\) [13]: for all \(t, t_1 \in T_\varSigma \) with \(t\rightarrow _{R/A}t_1\) there is \(t_2\in T_\varSigma \) s.t. \({ can}_{E/A}(t)\rightarrow _{R/A}t_2\) and \({ can}_{E/A}(t_1)=_A can_{E/A}(t_2)\).

The relation \(\rightarrow _{R/A}\) denotes the one-step rewriting relation defined by applying a rule from \(R\) modulo axioms \(A\): \(u \rightarrow _{R/A} v\) iff there are the terms \(u',v'\), a rule \(l\rightarrow r~\mathbf{if}~b\) in \(R\), position \(p\) in \(u'\), and substitution \(\sigma \) such that \(u =_A u'\), \(v=_A v'\), \(u'|_p=\sigma (l)\) Footnote 1, \(v' = u[\sigma (r)]_p\) Footnote 2, and \(\sigma (b) =_A { true }\).

The rewriting relation \(\rightarrow _\mathcal {R}\) defined by an executable rewrite theory \(\mathcal {R}\) is: \(t_1\rightarrow _\mathcal {R}t_2\) iff \( can _{E/A}(t_1)\rightarrow _{R/A}t'_2\) and \( can _{E/A}(t'_2) = t_2\). This is equivalent to \(\rightarrow _{R/(E\cup A)}\) due to confluence and coherence. We write \(t_1\!\xrightarrow {\alpha }_\mathcal {R}\!t_2\) to emphasise that \(\alpha \triangleq (l\rightarrow r\mathbf ~ \mathbf{if}~b)\,\in \,R\) is applied in the rewriting step \( can _{E/A}(t_1)\!\rightarrow _{R/A}\!t'_2\).

4 Translating Language Definitions into Rewrite Theories

This section includes the main contribution of the paper. We introduce two encodings of language definitions as rewrite theories: a faithful encoding and an approximate encoding. Since the symbolic extension of a language is also a language definition, we automatically get encodings of both concrete languages and their symbolic extensions. We investigate how the properties relating a language definition and its symbolic extension are reflected on their respective encodings.

Definition 3

(faithful encoding). Let \(\mathcal {L}=(\varSigma ,\mathcal {T},\mathcal {S})\) be a language definition. The faithful encoding of \(\mathcal {L}\) is \(\mathcal {R}(\mathcal {L})=(\varSigma ,E\cup A,R)\), where

  • \(A=\emptyset \);

  • for each operation \(f\) in \(\varSigma ^\mathsf {Data}\) and \(d_1,\ldots ,d_n\in \mathcal {D}\) of corresponding sorts, \(E\) includes an equation \(f(d_1,\ldots ,d_n)=\mathcal {D}_f(d_1,\dots ,d_n)\);

  • \(R=\mathcal {S}\), where each rule \({{\pi }\;\pmb {\wedge }\;{b}}\;\pmb {\Rightarrow }\;{r} \in \mathcal {S}\) becomes a rewrite rule \(l\rightarrow r\mathbf \,if\, b\in R\).

Theorem 3

Let \(\mathcal {L}=(\varSigma ,\mathcal {T},\mathcal {S})\) be a language definition. Then \(\mathcal {R}(\mathcal {L})\) is an executable rewrite theory satisfying \(\gamma \mathop {\Longrightarrow }\limits ^{\alpha }\mathop {_{\mathcal {S}}}\limits ^{}\gamma '\) iff \(\gamma \xrightarrow {\alpha }_{\mathcal {R}(\mathcal {L})}\gamma '\), for all \(\gamma ,\gamma '\in \mathcal {T}_ Cfg \).

Remark 4

The construction of the rewrite theory \(\mathcal {R}(\mathcal {L})\), with data domain \(\mathcal {D}\subseteq \varSigma ^\mathsf {Data}\) defined by the set of equations \(E\) given in Definition 3, corresponds to the data domains \(\mathcal {D}\) being builtin sorts in the Maude terminology. A builtin sort is a sort that is not built algebraically but one that, for efficiency reasons, is directly implemented in code ( code in the case of Maude). For example, natural numbers are specified by the equational specification \(0:\mathsf {Nat}, s: \mathsf {Nat} \rightarrow \mathsf {Nat}\), but using the resulting unary-notation for them would be highly inefficient. This is why natural numbers are implemented as builtins. The construction \(\mathcal {R}(\mathcal {L})\) can, however, be extended to accomodate non-builtin sorts, i.e., sorts that are defined as the initial model of a finite set of equations \(E'\) that are confluent and terminating modulo a set \(A\) of axioms. For this, it is enough to ensure that \(E' \cup E\) is also confluent and terminating modulo \(A\) - where \(E\) is the set of equations given in the proof of Theorem 3. This typically happens, as \(E\) and \(E'\) refer to different sorts - the builtin ones for the former, and the non-builtin ones for the latter. If this is the case then the proof of the ground coherence property in Theorem 3 still holds, because it only depends on \(E' \cup E\) being confluent and terminating modulo \(A\), not on the particular form of the equations. The proof of faithfulness of the encoding remains the same. This observation is important, since it ensures that we obtain executable Maude rewrite-theories \(\mathcal {R}(\mathcal {L})\) for languages-definitions \(\mathcal {L}\) whose data are specified using either bulitin sorts or non-builtin sorts. The faithfulness of the encoding then ensures that all results of reachability analyses (either positive or negative) performed on \(\mathcal {R}(\mathcal {L})\), e.g., obtained using Maude’s search command, also hold on \(\mathcal {L}\).

The symbolic extension of a language definition can be encoded as a rewrite theory as well. Let \({\mathcal {L}}^{\mathfrak {s}}=({\varSigma }^{\mathfrak {s}},{\mathcal {T}}^{\mathfrak {s}},{\mathcal {S}}^{\mathfrak {s}})\) be the symbolic extension of \(\mathcal {L}=(\varSigma ,\mathcal {T},\mathcal {S})\). Recall that \({\varSigma }^{\mathfrak {s}}\) is \(\varSigma \) extended with the constructor of symbolic configurations \({\_}\;\pmb {\wedge }\;{\_}\) and with the symbolic values \({V}^{\mathfrak {s}}\) seen as constants. The symbolic configurations are ground terms \({\pi }\;\pmb {\wedge }\;{\phi }\in {\mathcal {T}}^{\mathfrak {s}}_{{ Cfg }^{\mathfrak {s}}}\). If \(\mathcal {R}({\mathcal {L}}^{\mathfrak {s}})=({\varSigma }^{\mathfrak {s}},E\cup A,R)\) is the faithful encoding given by Theorem 3, then \(E=A=\emptyset \) because the data algebra \({\mathcal {D}}^{\mathfrak {s}}\) we considered is the \(\varSigma ^\mathsf{Data}({V}^{\mathfrak {s}})\)-algebra of the ground terms built over \(\mathcal {D}\) and \({V}^{\mathfrak {s}}\). Recall that we assumed that \(\mathcal {D}\subseteq \varSigma \subseteq \varSigma ^\mathsf{Data}({V}^{\mathfrak {s}})\).

The relationship between a language definition \(\mathcal {L}\) and its symbolic extension \({\mathcal {L}}^{\mathfrak {s}}\) can be now reflected at the level of the encodings \(\mathcal {R}(\mathcal {L})\) and \(\mathcal {R}({\mathcal {L}}^{\mathfrak {s}})\). A symbolic configuration \({\pi }\;\pmb {\wedge }\;{\phi }\) consists of a configuration ground term \(\pi \) (of sort \( Cfg \)) and a formula ground term \(\phi \) (of sort \( Bool \)). The constants \({V}^{\mathfrak {s}}\) play the role of logical variables, and the definition of satisfiability for patterns extends to their representations as symbolic configurations. Moreover, the notion of feasible execution in \(\mathcal {R}({\mathcal {L}}^{\mathfrak {s}})\) is defined similarly to how it is defined for \({\mathcal {L}}^{\mathfrak {s}}\). The following two results are direct consequences of Theorems 3, 1, and 2, respectively.

Corollary 1

(Coverage for Encoding Rewrite Theories). For every concrete execution \(\gamma _0 \xrightarrow {\alpha _0}_{\mathcal {R}(\mathcal {L})} \gamma _1 \xrightarrow {\alpha _2}_{\mathcal {R}(\mathcal {L})} \cdots \xrightarrow {\alpha _n}_{\mathcal {R}(\mathcal {L})} \gamma _n \xrightarrow {\alpha _{n+1}}_{\mathcal {R}(\mathcal {L})} \cdots \) there is a symbolic execution \({\pi _0}\;\pmb {\wedge }\;{\phi _0} \xrightarrow {\alpha _1}_{\mathcal {R}({\mathcal {L}}^{\mathfrak {s}})}{\pi _1}\;\pmb {\wedge }\;{\phi _1} \xrightarrow {\alpha _2}_{\mathcal {R}({\mathcal {L}}^{\mathfrak {s}})} \cdots \xrightarrow {\alpha _n}_{\mathcal {R}(\mathcal {L})} {\pi _n}\;\pmb {\wedge }\;{\phi _n} \xrightarrow {\alpha _{n+1}}_{\mathcal {R}({\mathcal {L}}^{\mathfrak {s}})} \cdots \) such that \(\gamma _i \in [\![{\pi _i}\;\pmb {\wedge }\;{\phi _i}]\!]\) for \(i = 0, 1, \ldots \).

Corollary 2

(Precision for Encoding Rewrite Theories). For every feasible symbolic execution \({\pi _0}\;\pmb {\wedge }\;{\phi _0} \xrightarrow {\alpha _1}_{\mathcal {R}({\mathcal {L}}^{\mathfrak {s}})}{\pi _1}\;\pmb {\wedge }\;{\phi _1} \xrightarrow {\alpha _2}_{\mathcal {R}({\mathcal {L}}^{\mathfrak {s}})} \cdots \xrightarrow {\alpha _n}_{\mathcal {R}(\mathcal {L})} {\pi _n}\;\pmb {\wedge }\;{\phi _n} \xrightarrow {\alpha _{n+1}}_{\mathcal {R}({\mathcal {L}}^{\mathfrak {s}})} \cdots \) there is a concrete execution \(\gamma _0 \xrightarrow {\alpha _0}_{\mathcal {R}(\mathcal {L})} \gamma _1 \xrightarrow {\alpha _2}_{\mathcal {R}(\mathcal {L})} \cdots \xrightarrow {\alpha _n}_{\mathcal {R}(\mathcal {L})} \gamma _n \xrightarrow {\alpha _{n+1}}_{\mathcal {R}(\mathcal {L})} \cdots \) such that \(\gamma _i \in [\![{\pi _i}\;\pmb {\wedge }\;{\phi _i}]\!]\) for \(i = 0, 1, \ldots \).

The faithful encoding thus enjoys nice theoretical properties, but it has a limited practical value when we consider actual \(\mathbb {K}\)definitions of nontrivial languages:

  • The heating and cooling rules, which are symmetric each other, may lead to infinite rewritings;

  • The generated state space may be very large, even for small programs.

There are currently two proposals for obtaining abstractions of the rewrite theories: equational abstraction [9] or transforming some semantical rules into equations [6].

The former amounts to basically deriving a new definition, where the new model \(\mathcal {T}\) is the quotient of the original one, usually requiring substantial input from the user, which is something we would like to avoid.

The latter might not be suitable for language definitions in general because, semantically, it would equate elements that are supposed to be distinct in \(\mathcal {T}\). Consider a language construct randBool with two rules: randBool => true and randBool => false. Assume now we want to analyze a program which uses randBool, but who fails to satisfy a given property regardless of whether randBool transits to true or to false. In this case it might beneficial to collapse the state space by considering only one of the cases; however, if we transform the two rules above into equations, this will semantically identify true and false in \(\mathcal {T}\), collapsing much more of the state space than desirable. An additional operational concern is that transforming certain rules into equations might destroy coherence and/or confluence, thus falling out of the executability requirements.

Two-layered rewrite theories, introduced below, allow us to preserve the benefits of the techniques above (state space reduction, efficient execution), while avoiding their semantical consequences (unnecessary collapse of states in the semantical model \(\mathcal {T}\)).

Definition 4

A two-layered rewrite theory is a tuple \(\mathfrak {R}=(\varSigma ,E\cup A, 1R \cup 2R ,\varepsilon )\), where \((\varSigma ,E\cup A, 1R \cup 2R )\) is an executable rewrite theory, \(E\cup 1R \) is ground terminating modulo \(A\), and \(\varepsilon : T_{\varSigma } \rightarrow T_{\varSigma }\) is a function that, for any \(t \in T_{\varSigma }\), returns an element in the set of \((E\cup 1R )/A\)-irreducible terms \(\{t' \in T_{\varSigma } \mid t \rightarrow ^!_{(E\cup 1R )/A}\, t'\}\) (which is nonempty precisely because \( E\cup 1R \) is ground terminating modulo \(A\)). The one-step rewrite relation \(\twoheadrightarrow _\mathcal {R}\) is defined by \(t_1\twoheadrightarrow _\mathcal {R}t_2\) iff \(\varepsilon (t_1) \rightarrow _{ 2R /A} t_2'\) and \(can_{E/A}(t_2')=_A t_2\).

Theorem 4

Let \(\mathcal {L}=(\varSigma ,\mathcal {T},\mathcal {S})\) be a language definition and \(\mathfrak {R}(\mathcal {L})=(\varSigma ,E\cup A, 1R \cup 2R ,\varepsilon )\) be a two-layered rewrite theory with \((\varSigma ,E\cup A, 1R \cup 2R )\) built as in Definition 3 but where the set of rules is partitioned into two subsets \( 1R \) and \( 2R \) and \( E\cup 1R \) is terminating modulo \(A\). If \(\gamma \twoheadrightarrow _{\mathfrak {R}(\mathcal {L})}\gamma '\) then \(\gamma \Rightarrow _{\mathcal {S}}^+\gamma '\).

We say that \(\mathfrak {R}(\mathcal {L})\) is an approximate encoding of \(\mathcal {L}\).

Corollary 3

(precision for approximate encoding). Let \(\mathcal {L}=(\varSigma ,\mathcal {T},\mathcal {S})\) be a language definition and \(\mathfrak {R}({\mathcal {L}}^{\mathfrak {s}})=(\varSigma ,E\cup A, 1R \cup 2R ,\varepsilon )\) be an approximate encoding of \({\mathcal {L}}^{\mathfrak {s}}\). For each feasible symbolic execution \({\pi _0}\;\pmb {\wedge }\;{\phi _0} \xrightarrow {}_{{\mathcal {R}}^{\mathfrak {s}}}{\pi _1}\;\pmb {\wedge }\;{\phi _1} \xrightarrow {}_{\mathfrak {R}({\mathcal {L}}^{\mathfrak {s}})} \cdots \xrightarrow {}_{\mathfrak {R}({\mathcal {L}}^{\mathfrak {s}})} {\pi _n}\;\pmb {\wedge }\;{\phi _n} \xrightarrow {}_{\mathfrak {R}({\mathcal {L}}^{\mathfrak {s}})} \cdots \) there is a concrete execution in \(\mathcal {L}\): \(\gamma _0 \mathop {\Longrightarrow }\limits ^{\alpha _1}\mathop {_{\mathcal {S}}}\limits ^{+} \gamma _1 \mathop {\Longrightarrow }\limits ^{\alpha _2}\mathop {_{\mathcal {S}}}\limits ^{+} \cdots \mathop {\Longrightarrow }\limits ^{\alpha _n}\mathop {_{\mathcal {S}}}\limits ^{+} \gamma _n \mathop {\Longrightarrow }\limits ^{\alpha _{n+1}}\mathop {_{\mathcal {S}}}\limits ^{+} \cdots \) such that \(\gamma _i \in [\![{\pi _i}\;\pmb {\wedge }\;{\phi _i}]\!]\) for \(i = 0, 1, \ldots \).

An interesting and practically relevant question is whether the coverage/precision relationships between \(\mathcal {L}\) and \({\mathcal {L}}^{\mathfrak {s}}\) can be reflected on the level of the approximate encodings as two-layered rewrite theories. To investigate these relationships, we have to find a way to define an approximate two-layered rewrite theory \(\mathfrak {R}({\mathcal {L}}^{\mathfrak {s}})\) that extends a given approximate two-layered rewrite theory \(\mathcal {R}(\mathcal {L})\). A first attempt is to define \(\mathfrak {R}({\mathcal {L}}^{\mathfrak {s}})=({\varSigma }^{\mathfrak {s}}, E\cup A, { 1R }^{\mathfrak {s}} \cup { 2R }^{\mathfrak {s}},{\varepsilon }^{\mathfrak {s}})\) from \(\mathcal {R}(\mathcal {L})\) in the same way \({\mathcal {L}}^{\mathfrak {s}}\) is obtained from \(\mathcal {L}\), but this is not enough to have a coverage-like result. The program log in Fig. 4 is deterministic and terminating for each \(\vartheta (A)\in Int\). So we may execute any instance of it with an approximate encoding \(\mathcal {R}\) having no second-layer rules, i.e., \( 2R =\emptyset \). If \({ 2R }^{\mathfrak {s}}=\emptyset \), then \({ 1R }^{\mathfrak {s}}\) is non terminating because there is an infinite execution corresponding to the case when the value of the program variable X in the current configuration is always greater the zero. Another problem is to specify how the strategy \(\varepsilon \) is extended to \({\varepsilon }^{\mathfrak {s}}\). Since it is hard to give general definitions for these questions, we opted for a particular solution that can be implemented in Maude.

Definition 5

(symbolic approximate encoding). Let \({\mathcal {L}}^{\mathfrak {s}}=({\varSigma }^{\mathfrak {s}},{\mathcal {T}}^{\mathfrak {s}},{\mathcal {S}}^{\mathfrak {s}})\) be the symbolic extension of \(\mathcal {L}=(\varSigma ,\mathcal {T},\mathcal {S})\) and \(\mathfrak {R}(\mathcal {L})=(\varSigma ,E\cup A, 1R \cup 2R ,\varepsilon )\) an approximate encoding of \(\mathcal {L}\). We assume that there is a total order relation \(\prec \) over \( 1R \) such that:

  1. 1.

    the rewrite \(t \rightarrow _{(E\cup 1R )/A}^! \varepsilon (t)\) uses the minimal rule from \( 1R \) w.r.t. \(\prec \) whenever such a rule is applicable;

  2. 2.

    if \(\alpha \) is unconditional and \(\alpha '\) is conditional then \(\alpha \prec \alpha '\).

We let the approximated encoding of \({\mathcal {L}}^{\mathfrak {s}}\) be \(\mathfrak {R}({\mathcal {L}}^{\mathfrak {s}})=({\varSigma }^{\mathfrak {s}},E\cup A, { 1R }^{\mathfrak {s}}\cup { 2R }^{\mathfrak {s}},{\varepsilon }^{\mathfrak {s}})\):

  • \({ 1R }^{\mathfrak {s}} = \{ {\alpha }^{\mathfrak {s}} \mid \alpha \in 1R , \alpha ~\mathrm{unconditional }\}\);

  • \({ 2R }^{\mathfrak {s}} = \{ {\alpha }^{\mathfrak {s}} \mid \alpha \in 1R , \alpha ~\mathrm{conditional }\} \cup \{ {\alpha }^{\mathfrak {s}} \mid \alpha \in 2R \}\);

  • \({\alpha }^{\mathfrak {s}}\,{\prec }^{\mathfrak {s}}\,{\alpha '}^{\mathfrak {s}}\) iff \(\alpha \prec \alpha '\);

  • \({\varepsilon }^{\mathfrak {s}}\) uses the minimal rule from \({ 1R }^{\mathfrak {s}}\) w.r.t. \({\prec }^{\mathfrak {s}}\).

Theorem 5

(coverage for approximate rewrite theories). Let \(\mathcal {L}=(\varSigma ,\mathcal {T},\mathcal {S})\) be a language definition and \(\mathfrak {R}(\mathcal {L})=(\varSigma ,E\cup A, 1R \cup 2R ,\varepsilon )\) be an approximate encoding of \(\mathcal {L}\). For every concrete execution \(\gamma _0 \xrightarrow {}_{\mathfrak {R}(\mathcal {L})} \gamma _1 \xrightarrow {}_{\mathfrak {R}(\mathcal {L})} \cdots \xrightarrow {}_{\mathfrak {R}(\mathcal {L})} \gamma _n \xrightarrow {}_{\mathfrak {R}(\mathcal {L})} \cdots \) there is a symbolic execution \({\pi _0}\;\pmb {\wedge }\;{\phi _0} \xrightarrow {}^+_{\mathfrak {R}({\mathcal {L}}^{\mathfrak {s}})}{\pi _1}\;\pmb {\wedge }\;{\phi _1} \xrightarrow {}^+_{\mathfrak {R}({\mathcal {L}}^{\mathfrak {s}})} \cdots \xrightarrow {}^+_{\mathfrak {R}({\mathcal {L}}^{\mathfrak {s}})} {\pi _n}\;\pmb {\wedge }\;{\phi _n} \xrightarrow {}^+_{\mathfrak {R}({\mathcal {L}}^{\mathfrak {s}})} \cdots \) such that \(\gamma _i \in [\![{\pi _i}\;\pmb {\wedge }\;{\phi _i}]\!]\) for \(i = 0, 1, \ldots \).

However, the precision relationship between \(\mathfrak {R}(\mathcal {L})\) and \(\mathfrak {R}({\mathcal {L}}^{\mathfrak {s}})\) does not hold in general. The reason is that \({ 1R }^{\mathfrak {s}}\) has fewer rules than \( 1R \) and hence the representative-selection strategy \({\varepsilon }^{\mathfrak {s}}\) is weaker than \(\varepsilon \). Therefore there are no guarantees that the concrete execution given by Corollary 3 will be the same with that chosen by the strategy \(\varepsilon \). If the strategy \({\varepsilon }^{\mathfrak {s}}\) is the “isomorphic image” of \(\varepsilon \) via the transformation \(\bullet \mapsto {\bullet }^{\mathfrak {s}}\), then the precision result holds:

Theorem 6

(precision for approximate rewrite theories). Let \(\mathcal {L}=(\varSigma ,\mathcal {T},\mathcal {S})\) be a language definition and \(\mathfrak {R}(\mathcal {L})=(\varSigma ,E\cup A, 1R \cup 2R ,\varepsilon )\) be an approximated encoding of \(\mathcal {L}\) such that \( 1R \) includes only unconditional rules (hence \({ 1R }^{\mathfrak {s}}=\{{\alpha }^{\mathfrak {s}}\mid \alpha \in 1R \}\)). For every feasible symbolic execution \({\pi _0}\;\pmb {\wedge }\;{\phi _0} \xrightarrow {}_{\mathfrak {R}({\mathcal {L}}^{\mathfrak {s}})}{\pi _1}\;\pmb {\wedge }\;{\phi _1} \xrightarrow {}_{\mathfrak {R}({\mathcal {L}}^{\mathfrak {s}})} \cdots \xrightarrow {}_{\mathfrak {R}({\mathcal {L}}^{\mathfrak {s}})} {\pi _n}\;\pmb {\wedge }\;{\phi _n} \xrightarrow {}_{\mathfrak {R}({\mathcal {L}}^{\mathfrak {s}})} \cdots \) there is a concrete one \(\gamma _0 \xrightarrow {}_{\mathfrak {R}(\mathcal {L})} \gamma _1 \xrightarrow {}_{\mathfrak {R}(\mathcal {L})} \cdots \xrightarrow {}_{\mathfrak {R}(\mathcal {L})} \gamma _n \xrightarrow {}_{\mathfrak {R}(\mathcal {L})} \cdots \) such that \(\gamma _i \in [\![{\pi _i}\;\pmb {\wedge }\;{\phi _i}]\!]\) for \(i = 0, 1, \ldots \).

5 Implementing the \(\mathbb {K}\)Framework in Maude

The current implementation of the \(\mathbb {K}\)framework uses Maude as a rewrite engine. In [4], the framework, at that time called K-Maude, was presented as an extension of Maude consisting in several meta-transformations which gradually translate \(\mathbb {K}\)modules into executable Maude modules. In the current version of \(\mathbb {K}\)we use a compiler for language definitions where each of these meta-transformations is actually a separate compilation step. Through compilation, \(\mathbb {K}\)definitions are translated into Maude rewrite theories which are then used for running/analysing programs. The main components of a \(\mathbb {K}\)definition are the syntax declarations, the configuration and the \(\mathbb {K}\)(rewrite) rules. To these, the tool adds automatically the rules generated from strictness annotations (e.g. heating/cooling rules 1–4.

The work described in this article is concerned with how the set of rules is compiled into a two-layered rewrite theory, which is then encoded into Maude by using equations for the first-layer rules and rewrite rules for the second-layer rules. By default, all \(\mathbb {K}\)rules are translated into (conditional) equations, that is \( 1R =\mathcal {S}\) and \( 2R =\emptyset \). This behavior can be altered by specifying (at compile time) that certain rules are to be considered transitions, which will trigger their transformation into (conditional) rewrite rules in the resulted Maude module.

To specify that a rule is a transition, one must pass the rule name as an argument for the -transition option at compilation time:

$ kompile cink.k -transition "division"

The above command specifies the rule division as a transition; thus, the rule for division is included in \( 2R \). By this command we express our intent that the tool considers the rule for division as a transition when exploring an execution’s transition system. By making it a rewrite rule in Maude, we can explore the non-determinism generated by the rule when using Maude’s search command.

Another source of non-determinism arises from strictness annotations. When the strict attribute is given to some syntactical construct, the tool chooses by default an arbitrary, fixed order to evaluate its arguments. This optimisation has the side effect of possibly losing behaviours due to missed interleavings.

Some of these missed interleavings can be restored using the -superheat option. This option is used to instruct the \(\mathbb {K}\)tool to exhaustively explore all the non-deterministic evaluation choices for the strictness of a language construct.

Once we know which rules are transitions and which are not, we can easily deduce the two sets \( 1R \) and \( 2R \), and thus we obtain the executable rewrite theory \(\mathfrak {R}(\mathcal {L})\) as discussed in Sect. 4.

The following example shows how one can explore more behaviours by specifying second-layer rules at compile time. If we compile the language definition of CinK without any options, then running the program counter (Fig. 4) will result in a single solution, where the return value is either 1 (when the tool first evaluates dec() and then inc()) or 3 (when it first evaluates inc() and then dec()). However, if we set the operation plus as superheat:

$ kompile cink -superheat "plus"

then we obtain both solutions, because the heating rule for addition can be applied in two ways and the option tells the tool to explore them both.

The symbolic transformations discussed in Sect. 3.2 are implemented as compilation steps in the \(\mathbb {K}\)compiler [2]. The tool uses the same translation to Maude discussed above in order to obtain the rewrite theory \(\mathfrak {R}({\mathcal {L}}^{\mathfrak {s}})\). An important step in this process is that conditional rules whose conditions cannot be reduced to true are compiled as transitions, that is, they are included in \( 2R \). When performing search in Maude, these rules are essential in exploring all the execution paths, thereby ensuring the Coverage (Theorem 5) property. Note that none of the symbolic transformations applied by the tool to the language definition changes the initial semantics of the language.

The implementation uses a slightly modified version of Maude which includes a hook to the Z3 SMT solver [5] and a corresponding operation called checkSat. It receives as argument an SMTLib string, which is sent to the solver to check its satisfiability. The result returned by the solver is propagated back through the hook to Maude as a string, so checkSat can return “sat”, “unsat”, or “unknown”. In practice, our tool uses checkSat to reduce the search space by slicing unfeasible execution paths, and thus being very important in preserving the precision property. To obtain \(\mathfrak {R}({\mathcal {L}}^{\mathfrak {s}})\) from a language definition one uses the symbolic backend as follows:

$ kompile cink -backend symbolic

This command applies the symbolic transformations, moves the appropriate rules in \( 2R \), and generates the rewrite theory \(\mathfrak {R}({\mathcal {L}}^{\mathfrak {s}})\). Using \(\mathfrak {R}({\mathcal {L}}^{\mathfrak {s}})\) one can execute programs using either concrete values or symbolic ones. However, running programs with symbolic values may lead to infinite loops when the loop conditions contain symbolic values. In such cases one can bound the number of execution paths:

$ krun log.imp -search -bound 3 -cIN=".List" -cPC="true"

This executes log (Fig. 4) symbolically, until a number of 3 solutions is found. Each solution consists in a result configuration and a formula which constitutes the path condition. The symbolic values are represented as fresh variables with a specific sort (e.g. A:Int). These can also be passed as input at the command line of the tool as arguments of the -cIN parameter. Users can also set the initial path condition using the -cPC option. During the symbolic execution the tool applies a rule only if the next state is feasible: the current path condition and the new conditions imposed by the application of the rule are not “unsat”.

6 Conclusion and Related Work

We presented some results that relate language definitions to different kinds of rewrite theories, which encode the language definitions both faithfully and approximately. The results show how (symbolic) analyses performed on a rewrite theory are reflected on the corresponding language definition. The general results are applied to the current implementation of \(\mathbb {K}\)language definitions in Maude.

The faitfful encoding of \(\mathbb {K}\)language definitions as rewrite theories is relatively simple but the resulting theory is not efficient in practice. Therefore we extended the notion of rewrite theory in order to work with under-approximations of the language definitions (and implicitly of the rewrite theories). The approximating theories are more efficient and flexible – the user has the freedom to work with various levels of approximations –, but heir use for program analysis must be done with care because they do not preserve all the behavioural properties. The coverage/precision results proved in this paper can help the user in correctly assessing which analyses hold on which representations.

Related Work. \(\mathbb {K}\)started as methodology for defining the semantics of the programming languages in Maude. The first tool supporting \(\mathbb {K}\)[4] was written in Maude’s meta-level, as a series of transformations translating \(\mathbb {K}\)definitions into Maude programs. Then the \(\mathbb {K}\)compiler became a more complex tool that translates a \(\mathbb {K}\)definition into an intermediate language, which is then used to generate code for various backends, including Maude. A presentation of this tool is given in [8]. There, a brief description of the semantics of \(\mathbb {K}\)definitions is also included. The programming-language definition framework presented here in Sect. 3 is a specialised case of that definition.

The coverage and precision properties, which relate the faithful rewrite-theory encoding of a language and of that language’s symbolic version, are analoguous to the soundness and completeness results in [10], which relate usual rewriting and rewriting modulo SMT. An interesting alternative to defining symbolic execution by as executions in a transformed language (as we do it in [2]) would be to compile a language into a rewriting-modulo-SMT Maude module.

Our construction of two-layered rewrite theories have some similarities with equational abstractions [9] and with the state-space reduction techniques obtained by transforming rules into equations presented in [6]. However, our first-layer rewrite rules do not equate states as Maude equations do; their semantics is that of transformation, not of equality. Therefore these rules do not have to satisfy the executability and property-preservation requirements of [6, 9].