wneuper/isa: doc-src/Ref/syntax.tex@361a71713176

     1 %% $Id$

     2 \chapter{Syntax Transformations} \label{chap:syntax}

     3 \newcommand\ttapp{\mathrel{\hbox{\tt\$}}}

     4 \newcommand\mtt[1]{\mbox{\tt #1}}

     5 \newcommand\ttfct[1]{\mathop{\mtt{#1}}\nolimits}

     6 \newcommand\Constant{\ttfct{Constant}}

     7 \newcommand\Variable{\ttfct{Variable}}

     8 \newcommand\Appl[1]{\ttfct{Appl}\,[#1]}

     9 \index{syntax!transformations|(}

    11 This chapter is intended for experienced Isabelle users who need to define

    12 macros or code their own translation functions.  It describes the

    13 transformations between parse trees, abstract syntax trees and terms.

    16 \section{Abstract syntax trees} \label{sec:asts}

    17 \index{ASTs|(}

    19 The parser, given a token list from the lexer, applies productions to yield

    20 a parse tree\index{parse trees}.  By applying some internal transformations

    21 the parse tree becomes an abstract syntax tree, or \AST{}.  Macro

    22 expansion, further translations and finally type inference yields a

    23 well-typed term.  The printing process is the reverse, except for some

    24 subtleties to be discussed later.

    26 Figure~\ref{fig:parse_print} outlines the parsing and printing process.

    27 Much of the complexity is due to the macro mechanism.  Using macros, you

    28 can specify most forms of concrete syntax without writing any \ML{} code.

    30 \begin{figure}

    31 \begin{center}

    32 \begin{tabular}{cl}

    33 string          & \\

    34 $\downarrow$    & parser \\

    35 parse tree      & \\

    36 $\downarrow$    & parse \AST{} translation \\

    37 \AST{}             & \\

    38 $\downarrow$    & \AST{} rewriting (macros) \\

    39 \AST{}             & \\

    40 $\downarrow$    & parse translation, type inference \\

    41 --- well-typed term --- & \\

    42 $\downarrow$    & print translation \\

    43 \AST{}             & \\

    44 $\downarrow$    & \AST{} rewriting (macros) \\

    45 \AST{}             & \\

    46 $\downarrow$    & print \AST{} translation, printer \\

    47 string          &

    48 \end{tabular}

    50 \end{center}

    51 \caption{Parsing and printing}\label{fig:parse_print}

    52 \end{figure}

    54 Abstract syntax trees are an intermediate form between the raw parse trees

    55 and the typed $\lambda$-terms.  An \AST{} is either an atom (constant or

    56 variable) or a list of {\em at least two\/} subtrees.  Internally, they

    57 have type \mltydx{Syntax.ast}: \index{*Constant} \index{*Variable}

    58 \index{*Appl}

    59 \begin{ttbox}

    60 datatype ast = Constant of string

    61              | Variable of string

    62              | Appl of ast list

    63 \end{ttbox}

    64 %

    65 Isabelle uses an S-expression syntax for abstract syntax trees.  Constant

    66 atoms are shown as quoted strings, variable atoms as non-quoted strings and

    67 applications as a parenthesized list of subtrees.  For example, the \AST

    68 \begin{ttbox}

    69 Appl [Constant "_constrain",

    70       Appl [Constant "_abs", Variable "x", Variable "t"],

    71       Appl [Constant "fun", Variable "'a", Variable "'b"]]

    72 \end{ttbox}

    73 is shown as {\tt ("_constrain" ("_abs" x t) ("fun" 'a 'b))}.

    74 Both {\tt ()} and {\tt (f)} are illegal because they have too few

    75 subtrees.

    77 The resemblance to Lisp's S-expressions is intentional, but there are two

    78 kinds of atomic symbols: $\Constant x$ and $\Variable x$.  Do not take the

    79 names {\tt Constant} and {\tt Variable} too literally; in the later

    80 translation to terms, $\Variable x$ may become a constant, free or bound

    81 variable, even a type constructor or class name; the actual outcome depends

    82 on the context.

    84 Similarly, you can think of ${\tt (} f~x@1~\ldots~x@n{\tt )}$ as the

    85 application of~$f$ to the arguments $x@1, \ldots, x@n$.  But the kind of

    86 application is determined later by context; it could be a type constructor

    87 applied to types.

    89 Forms like {\tt (("_abs" x $t$) $u$)} are legal, but \AST{}s are

    90 first-order: the {\tt "_abs"} does not bind the {\tt x} in any way.  Later

    91 at the term level, {\tt ("_abs" x $t$)} will become an {\tt Abs} node and

    92 occurrences of {\tt x} in $t$ will be replaced by bound variables (the term

    93 constructor \ttindex{Bound}).

    96 \section{Transforming parse trees to \AST{}s}\label{sec:astofpt}

    97 \index{ASTs!made from parse trees}

    98 \newcommand\astofpt[1]{\lbrakk#1\rbrakk}

   100 The parse tree is the raw output of the parser.  Translation functions,

   101 called {\bf parse AST translations}\indexbold{translations!parse AST},

   102 transform the parse tree into an abstract syntax tree.

   104 The parse tree is constructed by nesting the right-hand sides of the

   105 productions used to recognize the input.  Such parse trees are simply lists

   106 of tokens and constituent parse trees, the latter representing the

   107 nonterminals of the productions.  Let us refer to the actual productions in

   108 the form displayed by {\tt Syntax.print_syntax}.

   110 Ignoring parse \AST{} translations, parse trees are transformed to \AST{}s

   111 by stripping out delimiters and copy productions.  More precisely, the

   112 mapping $\astofpt{-}$ is derived from the productions as follows:

   113 \begin{itemize}

   114 \item Name tokens: $\astofpt{t} = \Variable s$, where $t$ is an \ndx{id},

   115   \ndx{var}, \ndx{tid} or \ndx{tvar} token, and $s$ its associated string.

   117 \item Copy productions:\index{productions!copy}

   118   $\astofpt{\ldots P \ldots} = \astofpt{P}$.  Here $\ldots$ stands for

   119   strings of delimiters, which are discarded.  $P$ stands for the single

   120   constituent that is not a delimiter; it is either a nonterminal symbol or

   121   a name token.

   123   \item 0-ary productions: $\astofpt{\ldots \mtt{=>} c} = \Constant c$.

   124     Here there are no constituents other than delimiters, which are

   125     discarded.

   127   \item $n$-ary productions, where $n \ge 1$: delimiters are discarded and

   128     the remaining constituents $P@1$, \ldots, $P@n$ are built into an

   129     application whose head constant is~$c$:

   130     \[ \astofpt{\ldots P@1 \ldots P@n \ldots \mtt{=>} c} =

   131        \Appl{\Constant c, \astofpt{P@1}, \ldots, \astofpt{P@n}}

   132     \]

   133 \end{itemize}

   134 Figure~\ref{fig:parse_ast} presents some simple examples, where {\tt ==},

   135 {\tt _appl}, {\tt _args}, and so forth name productions of the Pure syntax.

   136 These examples illustrate the need for further translations to make \AST{}s

   137 closer to the typed $\lambda$-calculus.  The Pure syntax provides

   138 predefined parse \AST{} translations\index{translations!parse AST} for

   139 ordinary applications, type applications, nested abstractions, meta

   140 implications and function types.  Figure~\ref{fig:parse_ast_tr} shows their

   141 effect on some representative input strings.

   144 \begin{figure}

   145 \begin{center}

   146 \tt\begin{tabular}{ll}

   147 \rm input string    & \rm \AST \\\hline

   148 "f"                 & f \\

   149 "'a"                & 'a \\

   150 "t == u"            & ("==" t u) \\

   151 "f(x)"              & ("_appl" f x) \\

   152 "f(x, y)"           & ("_appl" f ("_args" x y)) \\

   153 "f(x, y, z)"        & ("_appl" f ("_args" x ("_args" y z))) \\

   154 "\%x y.\ t"         & ("_lambda" ("_idts" x y) t) \\

   155 \end{tabular}

   156 \end{center}

   157 \caption{Parsing examples using the Pure syntax}\label{fig:parse_ast}

   158 \end{figure}

   160 \begin{figure}

   161 \begin{center}

   162 \tt\begin{tabular}{ll}

   163 \rm input string            & \rm \AST{} \\\hline

   164 "f(x, y, z)"                & (f x y z) \\

   165 "'a ty"                     & (ty 'a) \\

   166 "('a, 'b) ty"               & (ty 'a 'b) \\

   167 "\%x y z.\ t"               & ("_abs" x ("_abs" y ("_abs" z t))) \\

   168 "\%x ::\ 'a.\ t"            & ("_abs" ("_constrain" x 'a) t) \\

   169 "[| P; Q; R |] => S"        & ("==>" P ("==>" Q ("==>" R S))) \\

   170 "['a, 'b, 'c] => 'd"        & ("fun" 'a ("fun" 'b ("fun" 'c 'd)))

   171 \end{tabular}

   172 \end{center}

   173 \caption{Built-in parse \AST{} translations}\label{fig:parse_ast_tr}

   174 \end{figure}

   176 The names of constant heads in the \AST{} control the translation process.

   177 The list of constants invoking parse \AST{} translations appears in the

   178 output of {\tt Syntax.print_syntax} under {\tt parse_ast_translation}.

   181 \section{Transforming \AST{}s to terms}\label{sec:termofast}

   182 \index{terms!made from ASTs}

   183 \newcommand\termofast[1]{\lbrakk#1\rbrakk}

   185 The \AST{}, after application of macros (see \S\ref{sec:macros}), is

   186 transformed into a term.  This term is probably ill-typed since type

   187 inference has not occurred yet.  The term may contain type constraints

   188 consisting of applications with head {\tt "_constrain"}; the second

   189 argument is a type encoded as a term.  Type inference later introduces

   190 correct types or rejects the input.

   192 Another set of translation functions, namely parse

   193 translations\index{translations!parse}, may affect this process.  If we

   194 ignore parse translations for the time being, then \AST{}s are transformed

   195 to terms by mapping \AST{} constants to constants, \AST{} variables to

   196 schematic or free variables and \AST{} applications to applications.

   198 More precisely, the mapping $\termofast{-}$ is defined by

   199 \begin{itemize}

   200 \item Constants: $\termofast{\Constant x} = \ttfct{Const} (x,

   201   \mtt{dummyT})$.

   203 \item Schematic variables: $\termofast{\Variable \mtt{"?}xi\mtt"} =

   204   \ttfct{Var} ((x, i), \mtt{dummyT})$, where $x$ is the base name and $i$

   205   the index extracted from~$xi$.

   207 \item Free variables: $\termofast{\Variable x} = \ttfct{Free} (x,

   208   \mtt{dummyT})$.

   210 \item Function applications with $n$ arguments:

   211     \[ \termofast{\Appl{f, x@1, \ldots, x@n}} =

   212        \termofast{f} \ttapp

   213          \termofast{x@1} \ttapp \ldots \ttapp \termofast{x@n}

   214     \]

   215 \end{itemize}

   216 Here \ttindex{Const}, \ttindex{Var}, \ttindex{Free} and

   217 \verb|$|\index{$@{\tt\$}} are constructors of the datatype \mltydx{term},

   218 while \ttindex{dummyT} stands for some dummy type that is ignored during

   219 type inference.

   221 So far the outcome is still a first-order term.  Abstractions and bound

   222 variables (constructors \ttindex{Abs} and \ttindex{Bound}) are introduced

   223 by parse translations.  Such translations are attached to {\tt "_abs"},

   224 {\tt "!!"} and user-defined binders.

   227 \section{Printing of terms}

   228 \newcommand\astofterm[1]{\lbrakk#1\rbrakk}\index{ASTs!made from terms}

   230 The output phase is essentially the inverse of the input phase.  Terms are

   231 translated via abstract syntax trees into strings.  Finally the strings are

   232 pretty printed.

   234 Print translations (\S\ref{sec:tr_funs}) may affect the transformation of

   235 terms into \AST{}s.  Ignoring those, the transformation maps

   236 term constants, variables and applications to the corresponding constructs

   237 on \AST{}s.  Abstractions are mapped to applications of the special

   238 constant {\tt _abs}.

   240 More precisely, the mapping $\astofterm{-}$ is defined as follows:

   241 \begin{itemize}

   242   \item $\astofterm{\ttfct{Const} (x, \tau)} = \Constant x$.

   244   \item $\astofterm{\ttfct{Free} (x, \tau)} = constrain (\Variable x,

   245     \tau)$.

   247   \item $\astofterm{\ttfct{Var} ((x, i), \tau)} = constrain (\Variable

   248     \mtt{"?}xi\mtt", \tau)$, where $\mtt?xi$ is the string representation of

   249     the {\tt indexname} $(x, i)$.

   251   \item For the abstraction $\lambda x::\tau.t$, let $x'$ be a variant

   252     of~$x$ renamed to differ from all names occurring in~$t$, and let $t'$

   253     be obtained from~$t$ by replacing all bound occurrences of~$x$ by

   254     the free variable $x'$.  This replaces corresponding occurrences of the

   255     constructor \ttindex{Bound} by the term $\ttfct{Free} (x',

   256     \mtt{dummyT})$:

   257    \[ \astofterm{\ttfct{Abs} (x, \tau, t)} =

   258       \Appl{\Constant \mtt{"_abs"},

   259         constrain(\Variable x', \tau), \astofterm{t'}}.

   260     \]

   262   \item $\astofterm{\ttfct{Bound} i} = \Variable \mtt{"B.}i\mtt"$.

   263     The occurrence of constructor \ttindex{Bound} should never happen

   264     when printing well-typed terms; it indicates a de Bruijn index with no

   265     matching abstraction.

   267   \item Where $f$ is not an application,

   268     \[ \astofterm{f \ttapp x@1 \ttapp \ldots \ttapp x@n} =

   269        \Appl{\astofterm{f}, \astofterm{x@1}, \ldots,\astofterm{x@n}}

   270     \]

   271 \end{itemize}

   272 %

   273 Type constraints are inserted to allow the printing of types.  This is

   274 governed by the boolean variable \ttindex{show_types}:

   275 \begin{itemize}

   276   \item $constrain(x, \tau) = x$ \ if $\tau = \mtt{dummyT}$ \index{*dummyT} or

   277     \ttindex{show_types} not set to {\tt true}.

   279   \item $constrain(x, \tau) = \Appl{\Constant \mtt{"_constrain"}, x,

   280          \astofterm{\tau}}$ \ otherwise.

   282     Here, $\astofterm{\tau}$ is the \AST{} encoding of $\tau$: type

   283     constructors go to {\tt Constant}s; type identifiers go to {\tt

   284       Variable}s; type applications go to {\tt Appl}s with the type

   285     constructor as the first element.  If \ttindex{show_sorts} is set to

   286     {\tt true}, some type variables are decorated with an \AST{} encoding

   287     of their sort.

   288 \end{itemize}

   289 %

   290 The \AST{}, after application of macros (see \S\ref{sec:macros}), is

   291 transformed into the final output string.  The built-in {\bf print AST

   292   translations}\indexbold{translations!print AST} effectively reverse the

   293 parse \AST{} translations of Fig.\ts\ref{fig:parse_ast_tr}.

   295 For the actual printing process, the names attached to productions

   296 of the form $\ldots A^{(p@1)}@1 \ldots A^{(p@n)}@n \ldots \mtt{=>} c$ play

   297 a vital role.  Each \AST{} with constant head $c$, namely $\mtt"c\mtt"$ or

   298 $(\mtt"c\mtt"~ x@1 \ldots x@n)$, is printed according to the production

   299 for~$c$.  Each argument~$x@i$ is converted to a string, and put in

   300 parentheses if its priority~$(p@i)$ requires this.  The resulting strings

   301 and their syntactic sugar (denoted by \dots{} above) are joined to make a

   302 single string.

   304 If an application $(\mtt"c\mtt"~ x@1 \ldots x@m)$ has more arguments than the

   305 corresponding production, it is first split into $((\mtt"c\mtt"~ x@1 \ldots

   306 x@n) ~ x@{n+1} \ldots x@m)$.  Applications with too few arguments or with

   307 non-constant head or without a corresponding production are printed as

   308 $f(x@1, \ldots, x@l)$ or $(\alpha@1, \ldots, \alpha@l) ty$.  An occurrence of

   309 $\Variable x$ is simply printed as~$x$.

   311 Blanks are {\em not\/} inserted automatically.  If blanks are required to

   312 separate tokens, specify them in the mixfix declaration, possibly preceded

   313 by a slash~({\tt/}) to allow a line break.

   314 \index{ASTs|)}

   318 \section{Macros: Syntactic rewriting} \label{sec:macros}

   319 \index{macros|(}\index{rewriting!syntactic|(}

   321 Mixfix declarations alone can handle situations where there is a direct

   322 connection between the concrete syntax and the underlying term.  Sometimes

   323 we require a more elaborate concrete syntax, such as quantifiers and list

   324 notation.  Isabelle's {\bf macros} and {\bf translation functions} can

   325 perform translations such as

   326 \begin{center}\tt

   327   \begin{tabular}{r@{$\quad\protect\rightleftharpoons\quad$}l}

   328     ALL x:A.P   & Ball(A, \%x.P)        \\ \relax

   329     [x, y, z]   & Cons(x, Cons(y, Cons(z, Nil)))

   330   \end{tabular}

   331 \end{center}

   332 Translation functions (see \S\ref{sec:tr_funs}) must be coded in ML; they

   333 are the most powerful translation mechanism but are difficult to read or

   334 write.  Macros are specified by first-order rewriting systems that operate

   335 on abstract syntax trees.  They are usually easy to read and write, and can

   336 express all but the most obscure translations.

   338 Figure~\ref{fig:set_trans} defines a fragment of first-order logic and set

   339 theory.\footnote{This and the following theories are complete working

   340   examples, though they specify only syntax, no axioms.  The file {\tt

   341     ZF/zf.thy} presents the full set theory definition, including many

   342   macro rules.}  Theory {\tt SET} defines constants for set comprehension

   343 ({\tt Collect}), replacement ({\tt Replace}) and bounded universal

   344 quantification ({\tt Ball}).  Each of these binds some variables.  Without

   345 additional syntax we should have to express $\forall x \in A.  P$ as {\tt

   346   Ball(A,\%x.P)}, and similarly for the others.

   348 \begin{figure}

   349 \begin{ttbox}

   350 SET = Pure +

   351 types

   352   i, o

   353 arities

   354   i, o :: logic

   355 consts

   356   Trueprop      :: "o => prop"              ("_" 5)

   357   Collect       :: "[i, i => o] => i"

   358   "{\at}Collect"    :: "[idt, i, o] => i"       ("(1{\ttlbrace}_:_./ _{\ttrbrace})")

   359   Replace       :: "[i, [i, i] => o] => i"

   360   "{\at}Replace"    :: "[idt, idt, i, o] => i"  ("(1{\ttlbrace}_./ _:_, _{\ttrbrace})")

   361   Ball          :: "[i, i => o] => o"

   362   "{\at}Ball"       :: "[idt, i, o] => o"       ("(3ALL _:_./ _)" 10)

   363 translations

   364   "{\ttlbrace}x:A. P{\ttrbrace}"    == "Collect(A, \%x. P)"

   365   "{\ttlbrace}y. x:A, Q{\ttrbrace}" == "Replace(A, \%x y. Q)"

   366   "ALL x:A. P"  == "Ball(A, \%x. P)"

   367 end

   368 \end{ttbox}

   369 \caption{Macro example: set theory}\label{fig:set_trans}

   370 \end{figure}

   372 The theory specifies a variable-binding syntax through additional

   373 productions that have mixfix declarations.  Each non-copy production must

   374 specify some constant, which is used for building \AST{}s.

   375 \index{constants!syntactic} The additional constants are decorated with

   376 {\tt\at} to stress their purely syntactic purpose; they should never occur

   377 within the final well-typed terms.  Furthermore, they cannot be written in

   378 formulae because they are not legal identifiers.

   380 The translations cause the replacement of external forms by internal forms

   381 after parsing, and vice versa before printing of terms.  As a specification

   382 of the set theory notation, they should be largely self-explanatory.  The

   383 syntactic constants, {\tt\at Collect}, {\tt\at Replace} and {\tt\at Ball},

   384 appear implicitly in the macro rules via their mixfix forms.

   386 Macros can define variable-binding syntax because they operate on \AST{}s,

   387 which have no inbuilt notion of bound variable.  The macro variables {\tt

   388   x} and~{\tt y} have type~{\tt idt} and therefore range over identifiers,

   389 in this case bound variables.  The macro variables {\tt P} and~{\tt Q}

   390 range over formulae containing bound variable occurrences.

   392 Other applications of the macro system can be less straightforward, and

   393 there are peculiarities.  The rest of this section will describe in detail

   394 how Isabelle macros are preprocessed and applied.

   397 \subsection{Specifying macros}

   398 Macros are basically rewrite rules on \AST{}s.  But unlike other macro

   399 systems found in programming languages, Isabelle's macros work in both

   400 directions.  Therefore a syntax contains two lists of rewrites: one for

   401 parsing and one for printing.

   403 \index{*translations section}

   404 The {\tt translations} section specifies macros.  The syntax for a macro is

   405 \[ (root)\; string \quad

   406    \left\{\begin{array}[c]{c} \mtt{=>} \\ \mtt{<=} \\ \mtt{==} \end{array}

   407    \right\} \quad

   408    (root)\; string

   409 \]

   410 %

   411 This specifies a parse rule ({\tt =>}), a print rule ({\tt <=}), or both

   412 ({\tt ==}).  The two strings specify the left and right-hand sides of the

   413 macro rule.  The $(root)$ specification is optional; it specifies the

   414 nonterminal for parsing the $string$ and if omitted defaults to {\tt

   415   logic}.  \AST{} rewrite rules $(l, r)$ must obey certain conditions:

   416 \begin{itemize}

   417 \item Rules must be left linear: $l$ must not contain repeated variables.

   419 \item Rules must have constant heads, namely $l = \mtt"c\mtt"$ or $l =

   420   (\mtt"c\mtt" ~ x@1 \ldots x@n)$.

   422 \item Every variable in~$r$ must also occur in~$l$.

   423 \end{itemize}

   425 Macro rules may refer to any syntax from the parent theories.  They may

   426 also refer to anything defined before the the {\tt .thy} file's {\tt

   427   translations} section --- including any mixfix declarations.

   429 Upon declaration, both sides of the macro rule undergo parsing and parse

   430 \AST{} translations (see \S\ref{sec:asts}), but do not themselves undergo

   431 macro expansion.  The lexer runs in a different mode that additionally

   432 accepts identifiers of the form $\_~letter~quasiletter^*$ (like {\tt _idt},

   433 {\tt _K}).  Thus, a constant whose name starts with an underscore can

   434 appear in macro rules but not in ordinary terms.

   436 Some atoms of the macro rule's \AST{} are designated as constants for

   437 matching.  These are all names that have been declared as classes, types or

   438 constants.

   440 The result of this preprocessing is two lists of macro rules, each stored

   441 as a pair of \AST{}s.  They can be viewed using {\tt Syntax.print_syntax}

   442 (sections \ttindex{parse_rules} and \ttindex{print_rules}).  For

   443 theory~{\tt SET} of Fig.~\ref{fig:set_trans} these are

   444 \begin{ttbox}

   445 parse_rules:

   446   ("{\at}Collect" x A P)  ->  ("Collect" A ("_abs" x P))

   447   ("{\at}Replace" y x A Q)  ->  ("Replace" A ("_abs" x ("_abs" y Q)))

   448   ("{\at}Ball" x A P)  ->  ("Ball" A ("_abs" x P))

   449 print_rules:

   450   ("Collect" A ("_abs" x P))  ->  ("{\at}Collect" x A P)

   451   ("Replace" A ("_abs" x ("_abs" y Q)))  ->  ("{\at}Replace" y x A Q)

   452   ("Ball" A ("_abs" x P))  ->  ("{\at}Ball" x A P)

   453 \end{ttbox}

   455 \begin{warn}

   456   Avoid choosing variable names that have previously been used as

   457   constants, types or type classes; the {\tt consts} section in the output

   458   of {\tt Syntax.print_syntax} lists all such names.  If a macro rule works

   459   incorrectly, inspect its internal form as shown above, recalling that

   460   constants appear as quoted strings and variables without quotes.

   461 \end{warn}

   463 \begin{warn}

   464 If \ttindex{eta_contract} is set to {\tt true}, terms will be

   465 $\eta$-contracted {\em before\/} the \AST{} rewriter sees them.  Thus some

   466 abstraction nodes needed for print rules to match may vanish.  For example,

   467 \verb|Ball(A, %x. P(x))| contracts {\tt Ball(A, P)}; the print rule does

   468 not apply and the output will be {\tt Ball(A, P)}.  This problem would not

   469 occur if \ML{} translation functions were used instead of macros (as is

   470 done for binder declarations).

   471 \end{warn}

   474 \begin{warn}

   475 Another trap concerns type constraints.  If \ttindex{show_types} is set to

   476 {\tt true}, bound variables will be decorated by their meta types at the

   477 binding place (but not at occurrences in the body).  Matching with

   478 \verb|Collect(A, %x. P)| binds {\tt x} to something like {\tt ("_constrain" y

   479 "i")} rather than only {\tt y}.  \AST{} rewriting will cause the constraint to

   480 appear in the external form, say \verb|{y::i:A::i. P::o}|.

   482 To allow such constraints to be re-read, your syntax should specify bound

   483 variables using the nonterminal~\ndx{idt}.  This is the case in our

   484 example.  Choosing {\tt id} instead of {\tt idt} is a common error,

   485 especially since it appears in former versions of most of Isabelle's

   486 object-logics.

   487 \end{warn}

   491 \subsection{Applying rules}

   492 As a term is being parsed or printed, an \AST{} is generated as an

   493 intermediate form (recall Fig.\ts\ref{fig:parse_print}).  The \AST{} is

   494 normalized by applying macro rules in the manner of a traditional term

   495 rewriting system.  We first examine how a single rule is applied.

   497 Let $t$ be the abstract syntax tree to be normalized and $(l, r)$ some

   498 translation rule.  A subtree~$u$ of $t$ is a {\bf redex} if it is an

   499 instance of~$l$; in this case $l$ is said to {\bf match}~$u$.  A redex

   500 matched by $l$ may be replaced by the corresponding instance of~$r$, thus

   501 {\bf rewriting} the \AST~$t$.  Matching requires some notion of {\bf

   502   place-holders} that may occur in rule patterns but not in ordinary

   503 \AST{}s; {\tt Variable} atoms serve this purpose.

   505 The matching of the object~$u$ by the pattern~$l$ is performed as follows:

   506 \begin{itemize}

   507   \item Every constant matches itself.

   509   \item $\Variable x$ in the object matches $\Constant x$ in the pattern.

   510     This point is discussed further below.

   512   \item Every \AST{} in the object matches $\Variable x$ in the pattern,

   513     binding~$x$ to~$u$.

   515   \item One application matches another if they have the same number of

   516     subtrees and corresponding subtrees match.

   518   \item In every other case, matching fails.  In particular, {\tt

   519       Constant}~$x$ can only match itself.

   520 \end{itemize}

   521 A successful match yields a substitution that is applied to~$r$, generating

   522 the instance that replaces~$u$.

   524 The second case above may look odd.  This is where {\tt Variable}s of

   525 non-rule \AST{}s behave like {\tt Constant}s.  Recall that \AST{}s are not

   526 far removed from parse trees; at this level it is not yet known which

   527 identifiers will become constants, bounds, frees, types or classes.  As

   528 \S\ref{sec:asts} describes, former parse tree heads appear in \AST{}s as

   529 {\tt Constant}s, while the name tokens \ndx{id}, \ndx{var}, \ndx{tid} and

   530 \ndx{tvar} become {\tt Variable}s.  On the other hand, when \AST{}s

   531 generated from terms for printing, all constants and type constructors

   532 become {\tt Constant}s; see \S\ref{sec:asts}.  Thus \AST{}s may contain a

   533 messy mixture of {\tt Variable}s and {\tt Constant}s.  This is

   534 insignificant at macro level because matching treats them alike.

   536 Because of this behaviour, different kinds of atoms with the same name are

   537 indistinguishable, which may make some rules prone to misbehaviour.  Example:

   538 \begin{ttbox}

   539 types

   540   Nil

   541 consts

   542   Nil     :: "'a list"

   543   "[]"    :: "'a list"    ("[]")

   544 translations

   545   "[]"    == "Nil"

   546 \end{ttbox}

   547 The term {\tt Nil} will be printed as {\tt []}, just as expected.

   548 The term \verb|%Nil.t| will be printed as \verb|%[].t|, which might not be

   549 expected!

   551 Normalizing an \AST{} involves repeatedly applying macro rules until none

   552 is applicable.  Macro rules are chosen in the order that they appear in the

   553 {\tt translations} section.  You can watch the normalization of \AST{}s

   554 during parsing and printing by setting \ttindex{Syntax.trace_norm_ast} to

   555 {\tt true}.\index{tracing!of macros} Alternatively, use

   556 \ttindex{Syntax.test_read}.  The information displayed when tracing

   557 includes the \AST{} before normalization ({\tt pre}), redexes with results

   558 ({\tt rewrote}), the normal form finally reached ({\tt post}) and some

   559 statistics ({\tt normalize}).  If tracing is off,

   560 \ttindex{Syntax.stat_norm_ast} can be set to {\tt true} in order to enable

   561 printing of the normal form and statistics only.

   564 \subsection{Example: the syntax of finite sets}

   565 \index{examples!of macros}

   567 This example demonstrates the use of recursive macros to implement a

   568 convenient notation for finite sets.

   569 \index{*empty constant}\index{*insert constant}\index{{}@\verb'{}' symbol}

   570 \index{"@Enum@{\tt\at Enum} constant}

   571 \index{"@Finset@{\tt\at Finset} constant}

   572 \begin{ttbox}

   573 FINSET = SET +

   574 types

   575   is

   576 consts

   577   ""            :: "i => is"                ("_")

   578   "{\at}Enum"       :: "[i, is] => is"          ("_,/ _")

   579   empty         :: "i"                      ("{\ttlbrace}{\ttrbrace}")

   580   insert        :: "[i, i] => i"

   581   "{\at}Finset"     :: "is => i"                ("{\ttlbrace}(_){\ttrbrace}")

   582 translations

   583   "{\ttlbrace}x, xs{\ttrbrace}"     == "insert(x, {\ttlbrace}xs{\ttrbrace})"

   584   "{\ttlbrace}x{\ttrbrace}"         == "insert(x, {\ttlbrace}{\ttrbrace})"

   585 end

   586 \end{ttbox}

   587 Finite sets are internally built up by {\tt empty} and {\tt insert}.  The

   588 declarations above specify \verb|{x, y, z}| as the external representation

   589 of

   590 \begin{ttbox}

   591 insert(x, insert(y, insert(z, empty)))

   592 \end{ttbox}

   593 The nonterminal symbol~\ndx{is} stands for one or more objects of type~{\tt

   594   i} separated by commas.  The mixfix declaration \hbox{\verb|"_,/ _"|}

   595 allows a line break after the comma for \rmindex{pretty printing}; if no

   596 line break is required then a space is printed instead.

   598 The nonterminal is declared as the type~{\tt is}, but with no {\tt arities}

   599 declaration.  Hence {\tt is} is not a logical type and no default

   600 productions are added.  If we had needed enumerations of the nonterminal

   601 {\tt logic}, which would include all the logical types, we could have used

   602 the predefined nonterminal symbol \ndx{args} and skipped this part

   603 altogether.  The nonterminal~{\tt is} can later be reused for other

   604 enumerations of type~{\tt i} like lists or tuples.

   606 \index{"@Finset@{\tt\at Finset} constant}

   607 Next follows {\tt empty}, which is already equipped with its syntax

   608 \verb|{}|, and {\tt insert} without concrete syntax.  The syntactic

   609 constant {\tt\at Finset} provides concrete syntax for enumerations of~{\tt

   610   i} enclosed in curly braces.  Remember that a pair of parentheses, as in

   611 \verb|"{(_)}"|, specifies a block of indentation for pretty printing.

   613 The translations may look strange at first.  Macro rules are best

   614 understood in their internal forms:

   615 \begin{ttbox}

   616 parse_rules:

   617   ("{\at}Finset" ("{\at}Enum" x xs))  ->  ("insert" x ("{\at}Finset" xs))

   618   ("{\at}Finset" x)  ->  ("insert" x "empty")

   619 print_rules:

   620   ("insert" x ("{\at}Finset" xs))  ->  ("{\at}Finset" ("{\at}Enum" x xs))

   621   ("insert" x "empty")  ->  ("{\at}Finset" x)

   622 \end{ttbox}

   623 This shows that \verb|{x, xs}| indeed matches any set enumeration of at least

   624 two elements, binding the first to {\tt x} and the rest to {\tt xs}.

   625 Likewise, \verb|{xs}| and \verb|{x}| represent any set enumeration.

   626 The parse rules only work in the order given.

   628 \begin{warn}

   629   The \AST{} rewriter cannot discern constants from variables and looks

   630   only for names of atoms.  Thus the names of {\tt Constant}s occurring in

   631   the (internal) left-hand side of translation rules should be regarded as

   632   \rmindex{reserved words}.  Choose non-identifiers like {\tt\at Finset} or

   633   sufficiently long and strange names.  If a bound variable's name gets

   634   rewritten, the result will be incorrect; for example, the term

   635 \begin{ttbox}

   636 \%empty insert. insert(x, empty)

   637 \end{ttbox}

   638   gets printed as \verb|%empty insert. {x}|.

   639 \end{warn}

   642 \subsection{Example: a parse macro for dependent types}\label{prod_trans}

   643 \index{examples!of macros}

   645 As stated earlier, a macro rule may not introduce new {\tt Variable}s on

   646 the right-hand side.  Something like \verb|"K(B)" => "%x. B"| is illegal;

   647 if allowed, it could cause variable capture.  In such cases you usually

   648 must fall back on translation functions.  But a trick can make things

   649 readable in some cases: {\em calling translation functions by parse

   650   macros}:

   651 \begin{ttbox}

   652 PROD = FINSET +

   653 consts

   654   Pi            :: "[i, i => i] => i"

   655   "{\at}PROD"       :: "[idt, i, i] => i"     ("(3PROD _:_./ _)" 10)

   656   "{\at}->"         :: "[i, i] => i"          ("(_ ->/ _)" [51, 50] 50)

   657 \ttbreak

   658 translations

   659   "PROD x:A. B" => "Pi(A, \%x. B)"

   660   "A -> B"      => "Pi(A, _K(B))"

   661 end

   662 ML

   663   val print_translation = [("Pi", dependent_tr' ("{\at}PROD", "{\at}->"))];

   664 \end{ttbox}

   666 Here {\tt Pi} is an internal constant for constructing general products.

   667 Two external forms exist: the general case {\tt PROD x:A.B} and the

   668 function space {\tt A -> B}, which abbreviates \verb|Pi(A, %x.B)| when {\tt B}

   669 does not depend on~{\tt x}.

   671 The second parse macro introduces {\tt _K(B)}, which later becomes \verb|%x.B|

   672 due to a parse translation associated with \cdx{_K}.  The order of the

   673 parse rules is critical.  Unfortunately there is no such trick for

   674 printing, so we have to add a {\tt ML} section for the print translation

   675 \ttindex{dependent_tr'}.

   677 Recall that identifiers with a leading {\tt _} are allowed in translation

   678 rules, but not in ordinary terms.  Thus we can create \AST{}s containing

   679 names that are not directly expressible.

   681 The parse translation for {\tt _K} is already installed in Pure, and {\tt

   682 dependent_tr'} is exported by the syntax module for public use.  See

   683 \S\ref{sec:tr_funs} below for more of the arcane lore of translation functions.

   684 \index{macros|)}\index{rewriting!syntactic|)}

   687 \section{Translation functions} \label{sec:tr_funs}

   688 \index{translations|(}

   689 %

   690 This section describes the translation function mechanism.  By writing

   691 \ML{} functions, you can do almost everything with terms or \AST{}s during

   692 parsing and printing.  The logic \LK\ is a good example of sophisticated

   693 transformations between internal and external representations of

   694 associative sequences; here, macros would be useless.

   696 A full understanding of translations requires some familiarity

   697 with Isabelle's internals, especially the datatypes {\tt term}, {\tt typ},

   698 {\tt Syntax.ast} and the encodings of types and terms as such at the various

   699 stages of the parsing or printing process.  Most users should never need to

   700 use translation functions.

   702 \subsection{Declaring translation functions}

   703 There are four kinds of translation functions.  Each such function is

   704 associated with a name, which triggers calls to it.  Such names can be

   705 constants (logical or syntactic) or type constructors.

   707 {\tt Syntax.print_syntax} displays the sets of names associated with the

   708 translation functions of a {\tt Syntax.syntax} under

   709 \ttindex{parse_ast_translation}, \ttindex{parse_translation},

   710 \ttindex{print_translation} and \ttindex{print_ast_translation}.  You can

   711 add new ones via the {\tt ML} section\index{*ML section} of

   712 a {\tt .thy} file.  There may never be more than one function of the same

   713 kind per name.  Conceptually, the {\tt ML} section should appear between

   714 {\tt consts} and {\tt translations}; newly installed translation functions

   715 are already effective when macros and logical rules are parsed.

   717 The {\tt ML} section is copied verbatim into the \ML\ file generated from a

   718 {\tt .thy} file.  Definitions made here are accessible as components of an

   719 \ML\ structure; to make some definitions private, use an \ML{} {\tt local}

   720 declaration.  The {\tt ML} section may install translation functions by

   721 declaring any of the following identifiers:

   722 \begin{ttbox}

   723 val parse_ast_translation : (string * (ast list -> ast)) list

   724 val print_ast_translation : (string * (ast list -> ast)) list

   725 val parse_translation     : (string * (term list -> term)) list

   726 val print_translation     : (string * (term list -> term)) list

   727 \end{ttbox}

   729 \subsection{The translation strategy}

   730 All four kinds of translation functions are treated similarly.  They are

   731 called during the transformations between parse trees, \AST{}s and terms

   732 (recall Fig.\ts\ref{fig:parse_print}).  Whenever a combination of the form

   733 $(\mtt"c\mtt"~x@1 \ldots x@n)$ is encountered, and a translation function

   734 $f$ of appropriate kind exists for $c$, the result is computed by the \ML{}

   735 function call $f \mtt[ x@1, \ldots, x@n \mtt]$.

   737 For \AST{} translations, the arguments $x@1, \ldots, x@n$ are \AST{}s.  A

   738 combination has the form $\Constant c$ or $\Appl{\Constant c, x@1, \ldots,

   739   x@n}$.  For term translations, the arguments are terms and a combination

   740 has the form $\ttfct{Const} (c, \tau)$ or $\ttfct{Const} (c, \tau) \ttapp

   741 x@1 \ttapp \ldots \ttapp x@n$.  Terms allow more sophisticated

   742 transformations than \AST{}s do, typically involving abstractions and bound

   743 variables.

   745 Regardless of whether they act on terms or \AST{}s,

   746 parse translations differ from print translations fundamentally:

   747 \begin{description}

   748 \item[Parse translations] are applied bottom-up.  The arguments are already

   749   in translated form.  The translations must not fail; exceptions trigger

   750   an error message.

   752 \item[Print translations] are applied top-down.  They are supplied with

   753   arguments that are partly still in internal form.  The result again

   754   undergoes translation; therefore a print translation should not introduce

   755   as head the very constant that invoked it.  The function may raise

   756   exception \xdx{Match} to indicate failure; in this event it has no

   757   effect.

   758 \end{description}

   760 Only constant atoms --- constructor \ttindex{Constant} for \AST{}s and

   761 \ttindex{Const} for terms --- can invoke translation functions.  This

   762 causes another difference between parsing and printing.

   764 Parsing starts with a string and the constants are not yet identified.

   765 Only parse tree heads create {\tt Constant}s in the resulting \AST, as

   766 described in \S\ref{sec:astofpt}.  Macros and parse \AST{} translations may

   767 introduce further {\tt Constant}s.  When the final \AST{} is converted to a

   768 term, all {\tt Constant}s become {\tt Const}s, as described in

   769 \S\ref{sec:termofast}.

   771 Printing starts with a well-typed term and all the constants are known.  So

   772 all logical constants and type constructors may invoke print translations.

   773 These, and macros, may introduce further constants.

   776 \subsection{Example: a print translation for dependent types}

   777 \index{examples!of translations}\indexbold{*dependent_tr'}

   779 Let us continue the dependent type example (page~\pageref{prod_trans}) by

   780 examining the parse translation for~\cdx{_K} and the print translation

   781 {\tt dependent_tr'}, which are both built-in.  By convention, parse

   782 translations have names ending with {\tt _tr} and print translations have

   783 names ending with {\tt _tr'}.  Search for such names in the Isabelle

   784 sources to locate more examples.

   786 Here is the parse translation for {\tt _K}:

   787 \begin{ttbox}

   788 fun k_tr [t] = Abs ("x", dummyT, incr_boundvars 1 t)

   789   | k_tr ts = raise TERM("k_tr",ts);

   790 \end{ttbox}

   791 If {\tt k_tr} is called with exactly one argument~$t$, it creates a new

   792 {\tt Abs} node with a body derived from $t$.  Since terms given to parse

   793 translations are not yet typed, the type of the bound variable in the new

   794 {\tt Abs} is simply {\tt dummyT}.  The function increments all {\tt Bound}

   795 nodes referring to outer abstractions by calling \ttindex{incr_boundvars},

   796 a basic term manipulation function defined in {\tt Pure/term.ML}.

   798 Here is the print translation for dependent types:

   799 \begin{ttbox}

   800 fun dependent_tr' (q,r) (A :: Abs (x, T, B) :: ts) =

   801       if 0 mem (loose_bnos B) then

   802         let val (x', B') = variant_abs (x, dummyT, B);

   803         in list_comb (Const (q, dummyT) $ Free (x', T) $ A $ B', ts)

   804         end

   805       else list_comb (Const (r, dummyT) $ A $ B, ts)

   806   | dependent_tr' _ _ = raise Match;

   807 \end{ttbox}

   808 The argument {\tt (q,r)} is supplied to {\tt dependent_tr'} by a curried

   809 function application during its installation.  We could set up print

   810 translations for both {\tt Pi} and {\tt Sigma} by including

   811 \begin{ttbox}\index{*ML section}

   812 val print_translation =

   813   [("Pi",    dependent_tr' ("{\at}PROD", "{\at}->")),

   814    ("Sigma", dependent_tr' ("{\at}SUM", "{\at}*"))];

   815 \end{ttbox}

   816 within the {\tt ML} section.  The first of these transforms ${\tt Pi}(A,

   817 \mtt{Abs}(x, T, B))$ into $\hbox{\tt{\at}PROD}(x', A, B')$ or

   818 $\hbox{\tt{\at}->}r(A, B)$, choosing the latter form if $B$ does not depend

   819 on~$x$.  It checks this using \ttindex{loose_bnos}, yet another function

   820 from {\tt Pure/term.ML}.  Note that $x'$ is a version of $x$ renamed away

   821 from all names in $B$, and $B'$ the body $B$ with {\tt Bound} nodes

   822 referring to our {\tt Abs} node replaced by $\ttfct{Free} (x',

   823 \mtt{dummyT})$.

   825 We must be careful with types here.  While types of {\tt Const}s are

   826 ignored, type constraints may be printed for some {\tt Free}s and

   827 {\tt Var}s if \ttindex{show_types} is set to {\tt true}.  Variables of type

   828 \ttindex{dummyT} are never printed with constraint, though.  The line

   829 \begin{ttbox}

   830         let val (x', B') = variant_abs (x, dummyT, B);

   831 \end{ttbox}\index{*variant_abs}

   832 replaces bound variable occurrences in~$B$ by the free variable $x'$ with

   833 type {\tt dummyT}.  Only the binding occurrence of~$x'$ is given the

   834 correct type~{\tt T}, so this is the only place where a type

   835 constraint might appear.

   836 \index{translations|)}

   837 \index{syntax!transformations|)}

author	lcp
	Fri, 15 Apr 1994 17:16:23 +0200
changeset 323	361a71713176
child 332	01b87a921967
permissions	-rw-r--r--