wneuper/isa: doc-src/Ref/syntax.tex@e27558a68b8d

     1 %% $Id$

     2 \chapter{Syntax Transformations} \label{chap:syntax}

     3 \newcommand\ttapp{\mathrel{\hbox{\tt\$}}}

     4 \newcommand\mtt[1]{\mbox{\tt #1}}

     5 \newcommand\ttfct[1]{\mathop{\mtt{#1}}\nolimits}

     6 \newcommand\Constant{\ttfct{Constant}}

     7 \newcommand\Variable{\ttfct{Variable}}

     8 \newcommand\Appl[1]{\ttfct{Appl}\,[#1]}

     9 \index{syntax!transformations|(}

    11 This chapter is intended for experienced Isabelle users who need to define

    12 macros or code their own translation functions.  It describes the

    13 transformations between parse trees, abstract syntax trees and terms.

    16 \section{Abstract syntax trees} \label{sec:asts}

    17 \index{ASTs|(}

    19 The parser, given a token list from the lexer, applies productions to yield

    20 a parse tree\index{parse trees}.  By applying some internal transformations

    21 the parse tree becomes an abstract syntax tree, or \AST{}.  Macro

    22 expansion, further translations and finally type inference yields a

    23 well-typed term.  The printing process is the reverse, except for some

    24 subtleties to be discussed later.

    26 Figure~\ref{fig:parse_print} outlines the parsing and printing process.

    27 Much of the complexity is due to the macro mechanism.  Using macros, you

    28 can specify most forms of concrete syntax without writing any \ML{} code.

    30 \begin{figure}

    31 \begin{center}

    32 \begin{tabular}{cl}

    33 string          & \\

    34 $\downarrow$    & lexer, parser \\

    35 parse tree      & \\

    36 $\downarrow$    & parse \AST{} translation \\

    37 \AST{}             & \\

    38 $\downarrow$    & \AST{} rewriting (macros) \\

    39 \AST{}             & \\

    40 $\downarrow$    & parse translation, type inference \\

    41 --- well-typed term --- & \\

    42 $\downarrow$    & print translation \\

    43 \AST{}             & \\

    44 $\downarrow$    & \AST{} rewriting (macros) \\

    45 \AST{}             & \\

    46 $\downarrow$    & print \AST{} translation, token translation \\

    47 string          &

    48 \end{tabular}

    50 \end{center}

    51 \caption{Parsing and printing}\label{fig:parse_print}

    52 \end{figure}

    54 Abstract syntax trees are an intermediate form between the raw parse trees

    55 and the typed $\lambda$-terms.  An \AST{} is either an atom (constant or

    56 variable) or a list of {\em at least two\/} subtrees.  Internally, they

    57 have type \mltydx{Syntax.ast}: \index{*Constant} \index{*Variable}

    58 \index{*Appl}

    59 \begin{ttbox}

    60 datatype ast = Constant of string

    61              | Variable of string

    62              | Appl of ast list

    63 \end{ttbox}

    64 %

    65 Isabelle uses an S-expression syntax for abstract syntax trees.  Constant

    66 atoms are shown as quoted strings, variable atoms as non-quoted strings and

    67 applications as a parenthesised list of subtrees.  For example, the \AST

    68 \begin{ttbox}

    69 Appl [Constant "_constrain",

    70       Appl [Constant "_abs", Variable "x", Variable "t"],

    71       Appl [Constant "fun", Variable "'a", Variable "'b"]]

    72 \end{ttbox}

    73 is shown as {\tt ("_constrain" ("_abs" x t) ("fun" 'a 'b))}.

    74 Both {\tt ()} and {\tt (f)} are illegal because they have too few

    75 subtrees.

    77 The resemblance to Lisp's S-expressions is intentional, but there are two

    78 kinds of atomic symbols: $\Constant x$ and $\Variable x$.  Do not take the

    79 names {\tt Constant} and {\tt Variable} too literally; in the later

    80 translation to terms, $\Variable x$ may become a constant, free or bound

    81 variable, even a type constructor or class name; the actual outcome depends

    82 on the context.

    84 Similarly, you can think of ${\tt (} f~x@1~\ldots~x@n{\tt )}$ as the

    85 application of~$f$ to the arguments $x@1, \ldots, x@n$.  But the kind of

    86 application is determined later by context; it could be a type constructor

    87 applied to types.

    89 Forms like {\tt (("_abs" x $t$) $u$)} are legal, but \AST{}s are

    90 first-order: the {\tt "_abs"} does not bind the {\tt x} in any way.  Later

    91 at the term level, {\tt ("_abs" x $t$)} will become an {\tt Abs} node and

    92 occurrences of {\tt x} in $t$ will be replaced by bound variables (the term

    93 constructor \ttindex{Bound}).

    96 \section{Transforming parse trees to \AST{}s}\label{sec:astofpt}

    97 \index{ASTs!made from parse trees}

    98 \newcommand\astofpt[1]{\lbrakk#1\rbrakk}

   100 The parse tree is the raw output of the parser.  Translation functions,

   101 called {\bf parse AST translations}\indexbold{translations!parse AST},

   102 transform the parse tree into an abstract syntax tree.

   104 The parse tree is constructed by nesting the right-hand sides of the

   105 productions used to recognize the input.  Such parse trees are simply lists

   106 of tokens and constituent parse trees, the latter representing the

   107 nonterminals of the productions.  Let us refer to the actual productions in

   108 the form displayed by {\tt print_syntax} (see \S\ref{sec:inspct-thy} for an

   109 example).

   111 Ignoring parse \AST{} translations, parse trees are transformed to \AST{}s

   112 by stripping out delimiters and copy productions.  More precisely, the

   113 mapping $\astofpt{-}$ is derived from the productions as follows:

   114 \begin{itemize}

   115 \item Name tokens: $\astofpt{t} = \Variable s$, where $t$ is an \ndx{id},

   116   \ndx{var}, \ndx{tid}, \ndx{tvar}, \ndx{xnum} or \ndx{xstr} token, and $s$

   117   its associated string.  Note that for {\tt xstr} this does not include the

   118   quotes.

   120 \item Copy productions:\index{productions!copy}

   121   $\astofpt{\ldots P \ldots} = \astofpt{P}$.  Here $\ldots$ stands for

   122   strings of delimiters, which are discarded.  $P$ stands for the single

   123   constituent that is not a delimiter; it is either a nonterminal symbol or

   124   a name token.

   126   \item 0-ary productions: $\astofpt{\ldots \mtt{=>} c} = \Constant c$.

   127     Here there are no constituents other than delimiters, which are

   128     discarded.

   130   \item $n$-ary productions, where $n \ge 1$: delimiters are discarded and

   131     the remaining constituents $P@1$, \ldots, $P@n$ are built into an

   132     application whose head constant is~$c$:

   133     \[ \astofpt{\ldots P@1 \ldots P@n \ldots \mtt{=>} c} =

   134        \Appl{\Constant c, \astofpt{P@1}, \ldots, \astofpt{P@n}}

   135     \]

   136 \end{itemize}

   137 Figure~\ref{fig:parse_ast} presents some simple examples, where {\tt ==},

   138 {\tt _appl}, {\tt _args}, and so forth name productions of the Pure syntax.

   139 These examples illustrate the need for further translations to make \AST{}s

   140 closer to the typed $\lambda$-calculus.  The Pure syntax provides

   141 predefined parse \AST{} translations\index{translations!parse AST} for

   142 ordinary applications, type applications, nested abstractions, meta

   143 implications and function types.  Figure~\ref{fig:parse_ast_tr} shows their

   144 effect on some representative input strings.

   147 \begin{figure}

   148 \begin{center}

   149 \tt\begin{tabular}{ll}

   150 \rm input string    & \rm \AST \\\hline

   151 "f"                 & f \\

   152 "'a"                & 'a \\

   153 "t == u"            & ("==" t u) \\

   154 "f(x)"              & ("_appl" f x) \\

   155 "f(x, y)"           & ("_appl" f ("_args" x y)) \\

   156 "f(x, y, z)"        & ("_appl" f ("_args" x ("_args" y z))) \\

   157 "\%x y.\ t"         & ("_lambda" ("_idts" x y) t) \\

   158 \end{tabular}

   159 \end{center}

   160 \caption{Parsing examples using the Pure syntax}\label{fig:parse_ast}

   161 \end{figure}

   163 \begin{figure}

   164 \begin{center}

   165 \tt\begin{tabular}{ll}

   166 \rm input string            & \rm \AST{} \\\hline

   167 "f(x, y, z)"                & (f x y z) \\

   168 "'a ty"                     & (ty 'a) \\

   169 "('a, 'b) ty"               & (ty 'a 'b) \\

   170 "\%x y z.\ t"               & ("_abs" x ("_abs" y ("_abs" z t))) \\

   171 "\%x ::\ 'a.\ t"            & ("_abs" ("_constrain" x 'a) t) \\

   172 "[| P; Q; R |] => S"        & ("==>" P ("==>" Q ("==>" R S))) \\

   173 "['a, 'b, 'c] => 'd"        & ("fun" 'a ("fun" 'b ("fun" 'c 'd)))

   174 \end{tabular}

   175 \end{center}

   176 \caption{Built-in parse \AST{} translations}\label{fig:parse_ast_tr}

   177 \end{figure}

   179 The names of constant heads in the \AST{} control the translation process.

   180 The list of constants invoking parse \AST{} translations appears in the

   181 output of {\tt print_syntax} under {\tt parse_ast_translation}.

   184 \section{Transforming \AST{}s to terms}\label{sec:termofast}

   185 \index{terms!made from ASTs}

   186 \newcommand\termofast[1]{\lbrakk#1\rbrakk}

   188 The \AST{}, after application of macros (see \S\ref{sec:macros}), is

   189 transformed into a term.  This term is probably ill-typed since type

   190 inference has not occurred yet.  The term may contain type constraints

   191 consisting of applications with head {\tt "_constrain"}; the second

   192 argument is a type encoded as a term.  Type inference later introduces

   193 correct types or rejects the input.

   195 Another set of translation functions, namely parse

   196 translations\index{translations!parse}, may affect this process.  If we

   197 ignore parse translations for the time being, then \AST{}s are transformed

   198 to terms by mapping \AST{} constants to constants, \AST{} variables to

   199 schematic or free variables and \AST{} applications to applications.

   201 More precisely, the mapping $\termofast{-}$ is defined by

   202 \begin{itemize}

   203 \item Constants: $\termofast{\Constant x} = \ttfct{Const} (x,

   204   \mtt{dummyT})$.

   206 \item Schematic variables: $\termofast{\Variable \mtt{"?}xi\mtt"} =

   207   \ttfct{Var} ((x, i), \mtt{dummyT})$, where $x$ is the base name and $i$

   208   the index extracted from~$xi$.

   210 \item Free variables: $\termofast{\Variable x} = \ttfct{Free} (x,

   211   \mtt{dummyT})$.

   213 \item Function applications with $n$ arguments:

   214     \[ \termofast{\Appl{f, x@1, \ldots, x@n}} =

   215        \termofast{f} \ttapp

   216          \termofast{x@1} \ttapp \ldots \ttapp \termofast{x@n}

   217     \]

   218 \end{itemize}

   219 Here \ttindex{Const}, \ttindex{Var}, \ttindex{Free} and

   220 \verb|$|\index{$@{\tt\$}} are constructors of the datatype \mltydx{term},

   221 while \ttindex{dummyT} stands for some dummy type that is ignored during

   222 type inference.

   224 So far the outcome is still a first-order term.  Abstractions and bound

   225 variables (constructors \ttindex{Abs} and \ttindex{Bound}) are introduced

   226 by parse translations.  Such translations are attached to {\tt "_abs"},

   227 {\tt "!!"} and user-defined binders.

   230 \section{Printing of terms}

   231 \newcommand\astofterm[1]{\lbrakk#1\rbrakk}\index{ASTs!made from terms}

   233 The output phase is essentially the inverse of the input phase.  Terms are

   234 translated via abstract syntax trees into strings.  Finally the strings are

   235 pretty printed.

   237 Print translations (\S\ref{sec:tr_funs}) may affect the transformation of

   238 terms into \AST{}s.  Ignoring those, the transformation maps

   239 term constants, variables and applications to the corresponding constructs

   240 on \AST{}s.  Abstractions are mapped to applications of the special

   241 constant {\tt _abs}.

   243 More precisely, the mapping $\astofterm{-}$ is defined as follows:

   244 \begin{itemize}

   245   \item $\astofterm{\ttfct{Const} (x, \tau)} = \Constant x$.

   247   \item $\astofterm{\ttfct{Free} (x, \tau)} = constrain (\Variable x,

   248     \tau)$.

   250   \item $\astofterm{\ttfct{Var} ((x, i), \tau)} = constrain (\Variable

   251     \mtt{"?}xi\mtt", \tau)$, where $\mtt?xi$ is the string representation of

   252     the {\tt indexname} $(x, i)$.

   254   \item For the abstraction $\lambda x::\tau.t$, let $x'$ be a variant

   255     of~$x$ renamed to differ from all names occurring in~$t$, and let $t'$

   256     be obtained from~$t$ by replacing all bound occurrences of~$x$ by

   257     the free variable $x'$.  This replaces corresponding occurrences of the

   258     constructor \ttindex{Bound} by the term $\ttfct{Free} (x',

   259     \mtt{dummyT})$:

   260    \[ \astofterm{\ttfct{Abs} (x, \tau, t)} =

   261       \Appl{\Constant \mtt{"_abs"},

   262         constrain(\Variable x', \tau), \astofterm{t'}}.

   263     \]

   265   \item $\astofterm{\ttfct{Bound} i} = \Variable \mtt{"B.}i\mtt"$.

   266     The occurrence of constructor \ttindex{Bound} should never happen

   267     when printing well-typed terms; it indicates a de Bruijn index with no

   268     matching abstraction.

   270   \item Where $f$ is not an application,

   271     \[ \astofterm{f \ttapp x@1 \ttapp \ldots \ttapp x@n} =

   272        \Appl{\astofterm{f}, \astofterm{x@1}, \ldots,\astofterm{x@n}}

   273     \]

   274 \end{itemize}

   275 %

   276 Type constraints\index{type constraints} are inserted to allow the printing

   277 of types.  This is governed by the boolean variable \ttindex{show_types}:

   278 \begin{itemize}

   279   \item $constrain(x, \tau) = x$ \ if $\tau = \mtt{dummyT}$ \index{*dummyT} or

   280     \ttindex{show_types} is set to {\tt false}.

   282   \item $constrain(x, \tau) = \Appl{\Constant \mtt{"_constrain"}, x,

   283          \astofterm{\tau}}$ \ otherwise.

   285     Here, $\astofterm{\tau}$ is the \AST{} encoding of $\tau$: type

   286     constructors go to {\tt Constant}s; type identifiers go to {\tt

   287       Variable}s; type applications go to {\tt Appl}s with the type

   288     constructor as the first element.  If \ttindex{show_sorts} is set to

   289     {\tt true}, some type variables are decorated with an \AST{} encoding

   290     of their sort.

   291 \end{itemize}

   292 %

   293 The \AST{}, after application of macros (see \S\ref{sec:macros}), is

   294 transformed into the final output string.  The built-in {\bf print AST

   295   translations}\indexbold{translations!print AST} reverse the

   296 parse \AST{} translations of Fig.\ts\ref{fig:parse_ast_tr}.

   298 For the actual printing process, the names attached to productions

   299 of the form $\ldots A^{(p@1)}@1 \ldots A^{(p@n)}@n \ldots \mtt{=>} c$ play

   300 a vital role.  Each \AST{} with constant head $c$, namely $\mtt"c\mtt"$ or

   301 $(\mtt"c\mtt"~ x@1 \ldots x@n)$, is printed according to the production

   302 for~$c$.  Each argument~$x@i$ is converted to a string, and put in

   303 parentheses if its priority~$(p@i)$ requires this.  The resulting strings

   304 and their syntactic sugar (denoted by \dots{} above) are joined to make a

   305 single string.

   307 If an application $(\mtt"c\mtt"~ x@1 \ldots x@m)$ has more arguments

   308 than the corresponding production, it is first split into

   309 $((\mtt"c\mtt"~ x@1 \ldots x@n) ~ x@{n+1} \ldots x@m)$.  Applications

   310 with too few arguments or with non-constant head or without a

   311 corresponding production are printed as $f(x@1, \ldots, x@l)$ or

   312 $(\alpha@1, \ldots, \alpha@l) ty$.  Multiple productions associated

   313 with some name $c$ are tried in order of appearance.  An occurrence of

   314 $\Variable x$ is simply printed as~$x$.

   316 Blanks are {\em not\/} inserted automatically.  If blanks are required to

   317 separate tokens, specify them in the mixfix declaration, possibly preceded

   318 by a slash~({\tt/}) to allow a line break.

   319 \index{ASTs|)}

   323 \section{Macros: syntactic rewriting} \label{sec:macros}

   324 \index{macros|(}\index{rewriting!syntactic|(}

   326 Mixfix declarations alone can handle situations where there is a direct

   327 connection between the concrete syntax and the underlying term.  Sometimes

   328 we require a more elaborate concrete syntax, such as quantifiers and list

   329 notation.  Isabelle's {\bf macros} and {\bf translation functions} can

   330 perform translations such as

   331 \begin{center}\tt

   332   \begin{tabular}{r@{$\quad\protect\rightleftharpoons\quad$}l}

   333     ALL x:A.P   & Ball(A, \%x.P)        \\ \relax

   334     [x, y, z]   & Cons(x, Cons(y, Cons(z, Nil)))

   335   \end{tabular}

   336 \end{center}

   337 Translation functions (see \S\ref{sec:tr_funs}) must be coded in ML; they

   338 are the most powerful translation mechanism but are difficult to read or

   339 write.  Macros are specified by first-order rewriting systems that operate

   340 on abstract syntax trees.  They are usually easy to read and write, and can

   341 express all but the most obscure translations.

   343 Figure~\ref{fig:set_trans} defines a fragment of first-order logic and

   344 set theory.\footnote{This and the following theories are complete

   345   working examples, though they specify only syntax, no axioms.  The

   346   file {\tt ZF/ZF.thy} presents a full set theory definition,

   347   including many macro rules.} Theory {\tt SetSyntax} declares

   348 constants for set comprehension ({\tt Collect}), replacement ({\tt

   349   Replace}) and bounded universal quantification ({\tt Ball}).  Each

   350 of these binds some variables.  Without additional syntax we should

   351 have to write $\forall x \in A.  P$ as {\tt Ball(A,\%x.P)}, and

   352 similarly for the others.

   354 \begin{figure}

   355 \begin{ttbox}

   356 SetSyntax = Pure +

   357 types

   358   i o

   359 arities

   360   i, o :: logic

   361 consts

   362   Trueprop      :: o => prop                ("_" 5)

   363   Collect       :: [i, i => o] => i

   364   Replace       :: [i, [i, i] => o] => i

   365   Ball          :: [i, i => o] => o

   366 syntax

   367   "{\at}Collect"    :: [idt, i, o] => i         ("(1{\ttlbrace}_:_./ _{\ttrbrace})")

   368   "{\at}Replace"    :: [idt, idt, i, o] => i    ("(1{\ttlbrace}_./ _:_, _{\ttrbrace})")

   369   "{\at}Ball"       :: [idt, i, o] => o         ("(3ALL _:_./ _)" 10)

   370 translations

   371   "{\ttlbrace}x:A. P{\ttrbrace}"    == "Collect(A, \%x. P)"

   372   "{\ttlbrace}y. x:A, Q{\ttrbrace}" == "Replace(A, \%x y. Q)"

   373   "ALL x:A. P"  == "Ball(A, \%x. P)"

   374 end

   375 \end{ttbox}

   376 \caption{Macro example: set theory}\label{fig:set_trans}

   377 \end{figure}

   379 The theory specifies a variable-binding syntax through additional productions

   380 that have mixfix declarations.  Each non-copy production must specify some

   381 constant, which is used for building \AST{}s. \index{constants!syntactic} The

   382 additional constants are decorated with {\tt\at} to stress their purely

   383 syntactic purpose; they may not occur within the final well-typed terms,

   384 being declared as {\tt syntax} rather than {\tt consts}.

   386 The translations cause the replacement of external forms by internal forms

   387 after parsing, and vice versa before printing of terms.  As a specification

   388 of the set theory notation, they should be largely self-explanatory.  The

   389 syntactic constants, {\tt\at Collect}, {\tt\at Replace} and {\tt\at Ball},

   390 appear implicitly in the macro rules via their mixfix forms.

   392 Macros can define variable-binding syntax because they operate on \AST{}s,

   393 which have no inbuilt notion of bound variable.  The macro variables {\tt

   394   x} and~{\tt y} have type~{\tt idt} and therefore range over identifiers,

   395 in this case bound variables.  The macro variables {\tt P} and~{\tt Q}

   396 range over formulae containing bound variable occurrences.

   398 Other applications of the macro system can be less straightforward, and

   399 there are peculiarities.  The rest of this section will describe in detail

   400 how Isabelle macros are preprocessed and applied.

   403 \subsection{Specifying macros}

   404 Macros are basically rewrite rules on \AST{}s.  But unlike other macro

   405 systems found in programming languages, Isabelle's macros work in both

   406 directions.  Therefore a syntax contains two lists of rewrites: one for

   407 parsing and one for printing.

   409 \index{*translations section}

   410 The {\tt translations} section specifies macros.  The syntax for a macro is

   411 \[ (root)\; string \quad

   412    \left\{\begin{array}[c]{c} \mtt{=>} \\ \mtt{<=} \\ \mtt{==} \end{array}

   413    \right\} \quad

   414    (root)\; string

   415 \]

   416 %

   417 This specifies a parse rule ({\tt =>}), a print rule ({\tt <=}), or both

   418 ({\tt ==}).  The two strings specify the left and right-hand sides of the

   419 macro rule.  The $(root)$ specification is optional; it specifies the

   420 nonterminal for parsing the $string$ and if omitted defaults to {\tt

   421   logic}.  \AST{} rewrite rules $(l, r)$ must obey certain conditions:

   422 \begin{itemize}

   423 \item Rules must be left linear: $l$ must not contain repeated variables.

   425 \item Rules must have constant heads, namely $l = \mtt"c\mtt"$ or $l =

   426   (\mtt"c\mtt" ~ x@1 \ldots x@n)$.

   428 \item Every variable in~$r$ must also occur in~$l$.

   429 \end{itemize}

   431 Macro rules may refer to any syntax from the parent theories.  They

   432 may also refer to anything defined before the current {\tt

   433   translations} section --- including any mixfix declarations.

   435 Upon declaration, both sides of the macro rule undergo parsing and parse

   436 \AST{} translations (see \S\ref{sec:asts}), but do not themselves undergo

   437 macro expansion.  The lexer runs in a different mode that additionally

   438 accepts identifiers of the form $\_~letter~quasiletter^*$ (like {\tt _idt},

   439 {\tt _K}).  Thus, a constant whose name starts with an underscore can

   440 appear in macro rules but not in ordinary terms.

   442 Some atoms of the macro rule's \AST{} are designated as constants for

   443 matching.  These are all names that have been declared as classes, types or

   444 constants (logical and syntactic).

   446 The result of this preprocessing is two lists of macro rules, each

   447 stored as a pair of \AST{}s.  They can be viewed using {\tt

   448   print_syntax} (sections \ttindex{parse_rules} and

   449 \ttindex{print_rules}).  For theory~{\tt SetSyntax} of

   450 Fig.~\ref{fig:set_trans} these are

   451 \begin{ttbox}

   452 parse_rules:

   453   ("{\at}Collect" x A P)  ->  ("Collect" A ("_abs" x P))

   454   ("{\at}Replace" y x A Q)  ->  ("Replace" A ("_abs" x ("_abs" y Q)))

   455   ("{\at}Ball" x A P)  ->  ("Ball" A ("_abs" x P))

   456 print_rules:

   457   ("Collect" A ("_abs" x P))  ->  ("{\at}Collect" x A P)

   458   ("Replace" A ("_abs" x ("_abs" y Q)))  ->  ("{\at}Replace" y x A Q)

   459   ("Ball" A ("_abs" x P))  ->  ("{\at}Ball" x A P)

   460 \end{ttbox}

   462 \begin{warn}

   463   Avoid choosing variable names that have previously been used as

   464   constants, types or type classes; the {\tt consts} section in the output

   465   of {\tt print_syntax} lists all such names.  If a macro rule works

   466   incorrectly, inspect its internal form as shown above, recalling that

   467   constants appear as quoted strings and variables without quotes.

   468 \end{warn}

   470 \begin{warn}

   471 If \ttindex{eta_contract} is set to {\tt true}, terms will be

   472 $\eta$-contracted {\em before\/} the \AST{} rewriter sees them.  Thus some

   473 abstraction nodes needed for print rules to match may vanish.  For example,

   474 \verb|Ball(A, %x. P(x))| contracts to {\tt Ball(A, P)}; the print rule does

   475 not apply and the output will be {\tt Ball(A, P)}.  This problem would not

   476 occur if \ML{} translation functions were used instead of macros (as is

   477 done for binder declarations).

   478 \end{warn}

   481 \begin{warn}

   482 Another trap concerns type constraints.  If \ttindex{show_types} is set to

   483 {\tt true}, bound variables will be decorated by their meta types at the

   484 binding place (but not at occurrences in the body).  Matching with

   485 \verb|Collect(A, %x. P)| binds {\tt x} to something like {\tt ("_constrain" y

   486 "i")} rather than only {\tt y}.  \AST{} rewriting will cause the constraint to

   487 appear in the external form, say \verb|{y::i:A::i. P::o}|.

   489 To allow such constraints to be re-read, your syntax should specify bound

   490 variables using the nonterminal~\ndx{idt}.  This is the case in our

   491 example.  Choosing {\tt id} instead of {\tt idt} is a common error.

   492 \end{warn}

   496 \subsection{Applying rules}

   497 As a term is being parsed or printed, an \AST{} is generated as an

   498 intermediate form (recall Fig.\ts\ref{fig:parse_print}).  The \AST{} is

   499 normalised by applying macro rules in the manner of a traditional term

   500 rewriting system.  We first examine how a single rule is applied.

   502 Let $t$ be the abstract syntax tree to be normalised and $(l, r)$ some

   503 translation rule.  A subtree~$u$ of $t$ is a {\bf redex} if it is an

   504 instance of~$l$; in this case $l$ is said to {\bf match}~$u$.  A redex

   505 matched by $l$ may be replaced by the corresponding instance of~$r$, thus

   506 {\bf rewriting} the \AST~$t$.  Matching requires some notion of {\bf

   507   place-holders} that may occur in rule patterns but not in ordinary

   508 \AST{}s; {\tt Variable} atoms serve this purpose.

   510 The matching of the object~$u$ by the pattern~$l$ is performed as follows:

   511 \begin{itemize}

   512   \item Every constant matches itself.

   514   \item $\Variable x$ in the object matches $\Constant x$ in the pattern.

   515     This point is discussed further below.

   517   \item Every \AST{} in the object matches $\Variable x$ in the pattern,

   518     binding~$x$ to~$u$.

   520   \item One application matches another if they have the same number of

   521     subtrees and corresponding subtrees match.

   523   \item In every other case, matching fails.  In particular, {\tt

   524       Constant}~$x$ can only match itself.

   525 \end{itemize}

   526 A successful match yields a substitution that is applied to~$r$, generating

   527 the instance that replaces~$u$.

   529 The second case above may look odd.  This is where {\tt Variable}s of

   530 non-rule \AST{}s behave like {\tt Constant}s.  Recall that \AST{}s are not

   531 far removed from parse trees; at this level it is not yet known which

   532 identifiers will become constants, bounds, frees, types or classes.  As

   533 \S\ref{sec:asts} describes, former parse tree heads appear in \AST{}s as

   534 {\tt Constant}s, while the name tokens \ndx{id}, \ndx{var}, \ndx{tid},

   535 \ndx{tvar}, \ndx{xnum} and \ndx{xstr} become {\tt Variable}s.  On the other

   536 hand, when \AST{}s generated from terms for printing, all constants and type

   537 constructors become {\tt Constant}s; see \S\ref{sec:asts}.  Thus \AST{}s may

   538 contain a messy mixture of {\tt Variable}s and {\tt Constant}s.  This is

   539 insignificant at macro level because matching treats them alike.

   541 Because of this behaviour, different kinds of atoms with the same name are

   542 indistinguishable, which may make some rules prone to misbehaviour.  Example:

   543 \begin{ttbox}

   544 types

   545   Nil

   546 consts

   547   Nil     :: 'a list

   548 syntax

   549   "[]"    :: 'a list    ("[]")

   550 translations

   551   "[]"    == "Nil"

   552 \end{ttbox}

   553 The term {\tt Nil} will be printed as {\tt []}, just as expected.

   554 The term \verb|%Nil.t| will be printed as \verb|%[].t|, which might not be

   555 expected!  Guess how type~{\tt Nil} is printed?

   557 Normalizing an \AST{} involves repeatedly applying macro rules until

   558 none are applicable.  Macro rules are chosen in order of appearance in

   559 the theory definitions.  You can watch the normalization of \AST{}s

   560 during parsing and printing by setting \ttindex{Syntax.trace_norm_ast}

   561 to {\tt true}.\index{tracing!of macros} Alternatively, use

   562 \ttindex{Syntax.test_read}.  The information displayed when tracing

   563 includes the \AST{} before normalization ({\tt pre}), redexes with

   564 results ({\tt rewrote}), the normal form finally reached ({\tt post})

   565 and some statistics ({\tt normalize}).  If tracing is off,

   566 \ttindex{Syntax.stat_norm_ast} can be set to {\tt true} in order to

   567 enable printing of the normal form and statistics only.

   570 \subsection{Example: the syntax of finite sets}

   571 \index{examples!of macros}

   573 This example demonstrates the use of recursive macros to implement a

   574 convenient notation for finite sets.

   575 \index{*empty constant}\index{*insert constant}\index{{}@\verb'{}' symbol}

   576 \index{"@Enum@{\tt\at Enum} constant}

   577 \index{"@Finset@{\tt\at Finset} constant}

   578 \begin{ttbox}

   579 FinSyntax = SetSyntax +

   580 types

   581   is

   582 syntax

   583   ""            :: i => is                  ("_")

   584   "{\at}Enum"       :: [i, is] => is            ("_,/ _")

   585 consts

   586   empty         :: i                        ("{\ttlbrace}{\ttrbrace}")

   587   insert        :: [i, i] => i

   588 syntax

   589   "{\at}Finset"     :: is => i                  ("{\ttlbrace}(_){\ttrbrace}")

   590 translations

   591   "{\ttlbrace}x, xs{\ttrbrace}"     == "insert(x, {\ttlbrace}xs{\ttrbrace})"

   592   "{\ttlbrace}x{\ttrbrace}"         == "insert(x, {\ttlbrace}{\ttrbrace})"

   593 end

   594 \end{ttbox}

   595 Finite sets are internally built up by {\tt empty} and {\tt insert}.  The

   596 declarations above specify \verb|{x, y, z}| as the external representation

   597 of

   598 \begin{ttbox}

   599 insert(x, insert(y, insert(z, empty)))

   600 \end{ttbox}

   601 The nonterminal symbol~\ndx{is} stands for one or more objects of type~{\tt

   602   i} separated by commas.  The mixfix declaration \hbox{\verb|"_,/ _"|}

   603 allows a line break after the comma for \rmindex{pretty printing}; if no

   604 line break is required then a space is printed instead.

   606 The nonterminal is declared as the type~{\tt is}, but with no {\tt arities}

   607 declaration.  Hence {\tt is} is not a logical type and may be used safely as

   608 a new nonterminal for custom syntax.  The nonterminal~{\tt is} can later be

   609 re-used for other enumerations of type~{\tt i} like lists or tuples.  If we

   610 had needed polymorphic enumerations, we could have used the predefined

   611 nonterminal symbol \ndx{args} and skipped this part altogether.

   613 \index{"@Finset@{\tt\at Finset} constant}

   614 Next follows {\tt empty}, which is already equipped with its syntax

   615 \verb|{}|, and {\tt insert} without concrete syntax.  The syntactic

   616 constant {\tt\at Finset} provides concrete syntax for enumerations of~{\tt

   617   i} enclosed in curly braces.  Remember that a pair of parentheses, as in

   618 \verb|"{(_)}"|, specifies a block of indentation for pretty printing.

   620 The translations may look strange at first.  Macro rules are best

   621 understood in their internal forms:

   622 \begin{ttbox}

   623 parse_rules:

   624   ("{\at}Finset" ("{\at}Enum" x xs))  ->  ("insert" x ("{\at}Finset" xs))

   625   ("{\at}Finset" x)  ->  ("insert" x "empty")

   626 print_rules:

   627   ("insert" x ("{\at}Finset" xs))  ->  ("{\at}Finset" ("{\at}Enum" x xs))

   628   ("insert" x "empty")  ->  ("{\at}Finset" x)

   629 \end{ttbox}

   630 This shows that \verb|{x,xs}| indeed matches any set enumeration of at least

   631 two elements, binding the first to {\tt x} and the rest to {\tt xs}.

   632 Likewise, \verb|{xs}| and \verb|{x}| represent any set enumeration.

   633 The parse rules only work in the order given.

   635 \begin{warn}

   636   The \AST{} rewriter cannot distinguish constants from variables and looks

   637   only for names of atoms.  Thus the names of {\tt Constant}s occurring in

   638   the (internal) left-hand side of translation rules should be regarded as

   639   \rmindex{reserved words}.  Choose non-identifiers like {\tt\at Finset} or

   640   sufficiently long and strange names.  If a bound variable's name gets

   641   rewritten, the result will be incorrect; for example, the term

   642 \begin{ttbox}

   643 \%empty insert. insert(x, empty)

   644 \end{ttbox}

   645 \par\noindent is incorrectly printed as \verb|%empty insert. {x}|.

   646 \end{warn}

   649 \subsection{Example: a parse macro for dependent types}\label{prod_trans}

   650 \index{examples!of macros}

   652 As stated earlier, a macro rule may not introduce new {\tt Variable}s on

   653 the right-hand side.  Something like \verb|"K(B)" => "%x.B"| is illegal;

   654 if allowed, it could cause variable capture.  In such cases you usually

   655 must fall back on translation functions.  But a trick can make things

   656 readable in some cases: {\em calling\/} translation functions by parse

   657 macros:

   658 \begin{ttbox}

   659 ProdSyntax = SetSyntax +

   660 consts

   661   Pi            :: [i, i => i] => i

   662 syntax

   663   "{\at}PROD"       :: [idt, i, i] => i       ("(3PROD _:_./ _)" 10)

   664   "{\at}->"         :: [i, i] => i            ("(_ ->/ _)" [51, 50] 50)

   665 \ttbreak

   666 translations

   667   "PROD x:A. B" => "Pi(A, \%x. B)"

   668   "A -> B"      => "Pi(A, _K(B))"

   669 end

   670 ML

   671   val print_translation = [("Pi", dependent_tr' ("{\at}PROD", "{\at}->"))];

   672 \end{ttbox}

   674 Here {\tt Pi} is a logical constant for constructing general products.

   675 Two external forms exist: the general case {\tt PROD x:A.B} and the

   676 function space {\tt A -> B}, which abbreviates \verb|Pi(A, %x.B)| when {\tt B}

   677 does not depend on~{\tt x}.

   679 The second parse macro introduces {\tt _K(B)}, which later becomes

   680 \verb|%x.B| due to a parse translation associated with \cdx{_K}.

   681 Unfortunately there is no such trick for printing, so we have to add a {\tt

   682 ML} section for the print translation \ttindex{dependent_tr'}.

   684 Recall that identifiers with a leading {\tt _} are allowed in translation

   685 rules, but not in ordinary terms.  Thus we can create \AST{}s containing

   686 names that are not directly expressible.

   688 The parse translation for {\tt _K} is already installed in Pure, and {\tt

   689 dependent_tr'} is exported by the syntax module for public use.  See

   690 \S\ref{sec:tr_funs} below for more of the arcane lore of translation functions.

   691 \index{macros|)}\index{rewriting!syntactic|)}

   694 \section{Translation functions} \label{sec:tr_funs}

   695 \index{translations|(}

   696 %

   697 This section describes the translation function mechanism.  By writing

   698 \ML{} functions, you can do almost everything to terms or \AST{}s

   699 during parsing and printing.  The logic \LK\ is a good example of

   700 sophisticated transformations between internal and external

   701 representations of sequents; here, macros would be useless.

   703 A full understanding of translations requires some familiarity

   704 with Isabelle's internals, especially the datatypes {\tt term}, {\tt typ},

   705 {\tt Syntax.ast} and the encodings of types and terms as such at the various

   706 stages of the parsing or printing process.  Most users should never need to

   707 use translation functions.

   709 \subsection{Declaring translation functions}

   710 There are four kinds of translation functions, with one of these

   711 coming in two variants.  Each such function is associated with a name,

   712 which triggers calls to it.  Such names can be constants (logical or

   713 syntactic) or type constructors.

   715 {\tt print_syntax} displays the sets of names associated with the translation

   716 functions of a theory under \texttt{parse_ast_translation}, etc.  You can add

   717 new ones via the {\tt ML} section\index{*ML section} of a theory definition

   718 file.  There may never be more than one function of the same kind per name.

   719 Even though the {\tt ML} section is the very last part of the file, newly

   720 installed translation functions are already effective when processing all of

   721 the preceding sections.

   723 The {\tt ML} section's contents are simply copied verbatim near the

   724 beginning of the \ML\ file generated from a theory definition file.

   725 Definitions made here are accessible as components of an \ML\

   726 structure; to make some parts private, use an \ML{} {\tt local}

   727 declaration.  The {\ML} code may install translation functions by

   728 declaring any of the following identifiers:

   729 \begin{ttbox}

   730 val parse_ast_translation   : (string * (ast list -> ast)) list

   731 val print_ast_translation   : (string * (ast list -> ast)) list

   732 val parse_translation       : (string * (term list -> term)) list

   733 val print_translation       : (string * (term list -> term)) list

   734 val typed_print_translation :

   735     (string * (bool -> typ -> term list -> term)) list

   736 \end{ttbox}

   738 \subsection{The translation strategy}

   739 The different kinds of translation functions are called during the

   740 transformations between parse trees, \AST{}s and terms (recall

   741 Fig.\ts\ref{fig:parse_print}).  Whenever a combination of the form

   742 $(\mtt"c\mtt"~x@1 \ldots x@n)$ is encountered, and a translation

   743 function $f$ of appropriate kind exists for $c$, the result is

   744 computed by the \ML{} function call $f \mtt[ x@1, \ldots, x@n \mtt]$.

   746 For \AST{} translations, the arguments $x@1, \ldots, x@n$ are \AST{}s.

   747 A combination has the form $\Constant c$ or $\Appl{\Constant c, x@1,

   748   \ldots, x@n}$.  For term translations, the arguments are terms and a

   749 combination has the form $\ttfct{Const} (c, \tau)$ or $\ttfct{Const}

   750 (c, \tau) \ttapp x@1 \ttapp \ldots \ttapp x@n$.  Terms allow more

   751 sophisticated transformations than \AST{}s do, typically involving

   752 abstractions and bound variables. {\em Typed} print translations may

   753 even peek at the type $\tau$ of the constant they are invoked on; they

   754 are also passed the current value of the \ttindex{show_sorts} flag.

   756 Regardless of whether they act on terms or \AST{}s, translation

   757 functions called during the parsing process differ from those for

   758 printing more fundamentally in their overall behaviour:

   759 \begin{description}

   760 \item[Parse translations] are applied bottom-up.  The arguments are already

   761   in translated form.  The translations must not fail; exceptions trigger

   762   an error message.

   764 \item[Print translations] are applied top-down.  They are supplied with

   765   arguments that are partly still in internal form.  The result again

   766   undergoes translation; therefore a print translation should not introduce

   767   as head the very constant that invoked it.  The function may raise

   768   exception \xdx{Match} to indicate failure; in this event it has no

   769   effect.

   770 \end{description}

   772 Only constant atoms --- constructor \ttindex{Constant} for \AST{}s and

   773 \ttindex{Const} for terms --- can invoke translation functions.  This

   774 causes another difference between parsing and printing.

   776 Parsing starts with a string and the constants are not yet identified.

   777 Only parse tree heads create {\tt Constant}s in the resulting \AST, as

   778 described in \S\ref{sec:astofpt}.  Macros and parse \AST{} translations may

   779 introduce further {\tt Constant}s.  When the final \AST{} is converted to a

   780 term, all {\tt Constant}s become {\tt Const}s, as described in

   781 \S\ref{sec:termofast}.

   783 Printing starts with a well-typed term and all the constants are known.  So

   784 all logical constants and type constructors may invoke print translations.

   785 These, and macros, may introduce further constants.

   788 \subsection{Example: a print translation for dependent types}

   789 \index{examples!of translations}\indexbold{*dependent_tr'}

   791 Let us continue the dependent type example (page~\pageref{prod_trans}) by

   792 examining the parse translation for~\cdx{_K} and the print translation

   793 {\tt dependent_tr'}, which are both built-in.  By convention, parse

   794 translations have names ending with {\tt _tr} and print translations have

   795 names ending with {\tt _tr'}.  Search for such names in the Isabelle

   796 sources to locate more examples.

   798 Here is the parse translation for {\tt _K}:

   799 \begin{ttbox}

   800 fun k_tr [t] = Abs ("x", dummyT, incr_boundvars 1 t)

   801   | k_tr ts = raise TERM ("k_tr", ts);

   802 \end{ttbox}

   803 If {\tt k_tr} is called with exactly one argument~$t$, it creates a new

   804 {\tt Abs} node with a body derived from $t$.  Since terms given to parse

   805 translations are not yet typed, the type of the bound variable in the new

   806 {\tt Abs} is simply {\tt dummyT}.  The function increments all {\tt Bound}

   807 nodes referring to outer abstractions by calling \ttindex{incr_boundvars},

   808 a basic term manipulation function defined in {\tt Pure/term.ML}.

   810 Here is the print translation for dependent types:

   811 \begin{ttbox}

   812 fun dependent_tr' (q, r) (A :: Abs (x, T, B) :: ts) =

   813       if 0 mem (loose_bnos B) then

   814         let val (x', B') = Syntax.variant_abs' (x, dummyT, B) in

   815           list_comb

   816             (Const (q,{\thinspace}dummyT) $ Syntax.mark_boundT (x',{\thinspace}T) $ A $ B',{\thinspace}ts)

   817         end

   818       else list_comb (Const (r, dummyT) $ A $ B, ts)

   819   | dependent_tr' _ _ = raise Match;

   820 \end{ttbox}

   821 The argument {\tt (q,{\thinspace}r)} is supplied to the curried function {\tt

   822   dependent_tr'} by a partial application during its installation.

   823 For example, we could set up print translations for both {\tt Pi} and

   824 {\tt Sigma} by including

   825 \begin{ttbox}\index{*ML section}

   826 val print_translation =

   827   [("Pi",    dependent_tr' ("{\at}PROD", "{\at}->")),

   828    ("Sigma", dependent_tr' ("{\at}SUM", "{\at}*"))];

   829 \end{ttbox}

   830 within the {\tt ML} section.  The first of these transforms ${\tt

   831   Pi}(A, \mtt{Abs}(x, T, B))$ into $\hbox{\tt{\at}PROD}(x', A, B')$ or

   832 $\hbox{\tt{\at}->}(A, B)$, choosing the latter form if $B$ does not

   833 depend on~$x$.  It checks this using \ttindex{loose_bnos}, yet another

   834 function from {\tt Pure/term.ML}.  Note that $x'$ is a version of $x$

   835 renamed away from all names in $B$, and $B'$ is the body $B$ with {\tt

   836   Bound} nodes referring to the {\tt Abs} node replaced by

   837 $\ttfct{Free} (x', \mtt{dummyT})$ (but marked as representing a bound

   838 variable).

   840 We must be careful with types here.  While types of {\tt Const}s are

   841 ignored, type constraints may be printed for some {\tt Free}s and

   842 {\tt Var}s if \ttindex{show_types} is set to {\tt true}.  Variables of type

   843 \ttindex{dummyT} are never printed with constraint, though.  The line

   844 \begin{ttbox}

   845         let val (x', B') = Syntax.variant_abs' (x, dummyT, B);

   846 \end{ttbox}\index{*Syntax.variant_abs'}

   847 replaces bound variable occurrences in~$B$ by the free variable $x'$ with

   848 type {\tt dummyT}.  Only the binding occurrence of~$x'$ is given the

   849 correct type~{\tt T}, so this is the only place where a type

   850 constraint might appear.

   852 Also note that we are responsible to mark free identifiers that

   853 actually represent bound variables.  This is achieved by

   854 \ttindex{Syntax.variant_abs'} and \ttindex{Syntax.mark_boundT} above.

   855 Failing to do so may cause these names to be printed in the wrong

   856 style.  \index{translations|)} \index{syntax!transformations|)}

   859 \section{Token translations} \label{sec:tok_tr}

   860 \index{token translations|(}

   861 %

   862 Isabelle's meta-logic features quite a lot of different kinds of

   863 identifiers, namely {\em class}, {\em tfree}, {\em tvar}, {\em free},

   864 {\em bound}, {\em var}.  One might want to have these printed in

   865 different styles, e.g.\ in bold or italic, or even transcribed into

   866 something more readable like $\alpha, \alpha', \beta$ instead of {\tt

   867   'a}, {\tt 'aa}, {\tt 'b} for type variables.  Token translations

   868 provide a means to such ends, enabling the user to install certain

   869 \ML{} functions associated with any logical \rmindex{token class} and

   870 depending on some \rmindex{print mode}.

   872 The logical class of identifiers can not necessarily be determined by

   873 its syntactic category, though.  For example, consider free vs.\ bound

   874 variables.  So Isabelle's pretty printing mechanism, starting from

   875 fully typed terms, has to be careful to preserve this additional

   876 information\footnote{This is done by marking atoms in abstract syntax

   877   trees appropriately.  The marks are actually visible by print

   878   translation functions -- they are just special constants applied to

   879   atomic asts, for example \texttt{("_bound" x)}.}.  In particular,

   880 user-supplied print translation functions operating on terms have to

   881 be well-behaved in this respect.  Free identifiers introduced to

   882 represent bound variables have to be marked appropriately (cf.\ the

   883 example at the end of \S\ref{sec:tr_funs}).

   885 \medskip Token translations may be installed by declaring the

   886 \ttindex{token_translation} value within the \texttt{ML} section of a

   887 theory definition file:

   888 \begin{ttbox}

   889 val token_translation: (string * string * (string -> string * int)) list

   890 \end{ttbox}\index{token_translation}

   891 The elements of this list are of the form $(m, c, f)$, where $m$ is a

   892 print mode identifier, $c$ a token class, and $f\colon string \to

   893 string \times int$ the actual translation function.  Assuming that $x$

   894 is of identifier class $c$, and print mode $m$ is the first one of all

   895 currently active modes that provide some translation for $c$, then $x$

   896 is output according to $f(x) = (x', len)$.  Thereby $x'$ is the

   897 modified identifier name and $len$ its visual length approximated in

   898 terms of whole characters.  Thus $x'$ may include non-printing parts

   899 like control sequences or markup information for typesetting systems.

   901 %FIXME example (?)

   903 \index{token translations|)}

   906 %%% Local Variables:

   907 %%% mode: latex

   908 %%% TeX-master: "ref"

   909 %%% End:

author	wenzelm
	Mon, 24 Aug 1998 19:12:13 +0200
changeset 5371	e27558a68b8d
parent 4597	a0bdee64194c
child 6343	97c697a32b73
permissions	-rw-r--r--