3 theory "ML" imports base begin
5 chapter {* Advanced ML programming *}
10 Like any style guide, also this one should not be interpreted dogmatically, but
11 with care and discernment. It consists of a collection of
12 recommendations which have been turned out useful over long years of
13 Isabelle system development and are supposed to support writing of readable
14 and managable code. Special cases encourage derivations,
15 as far as you know what you are doing.
16 \footnote{This style guide is loosely based on
17 \url{http://caml.inria.fr/resources/doc/guides/guidelines.en.html}}
21 \item[fundamental law of programming]
22 Whenever writing code, keep in mind: A program is
23 written once, modified ten times, and read
24 hundred times. So simplify its writing,
25 always keep future modifications in mind,
26 and never jeopardize readability. Every second you hesitate
27 to spend on making your code more clear you will
28 have to spend ten times understanding what you have
31 \item[white space matters]
32 Treat white space in your code as if it determines
37 \item The space bar is the easiest key to find on the keyboard,
38 press it as often as necessary. @{verbatim "2 + 2"} is better
39 than @{verbatim "2+2"}, likewise @{verbatim "f (x, y)"} is
40 better than @{verbatim "f(x,y)"}.
42 \item Restrict your lines to 80 characters. This will allow
43 you to keep the beginning of a line in view while watching
44 its end.\footnote{To acknowledge the lax practice of
45 text editing these days, we tolerate as much as 100
46 characters per line, but anything beyond 120 is not
47 considered proper source text.}
49 \item Ban tabulators; they are a context-sensitive formatting
50 feature and likely to confuse anyone not using your favorite
51 editor.\footnote{Some modern programming language even
52 forbid tabulators altogether according to the formal syntax
55 \item Get rid of trailing whitespace. Instead, do not
56 suppress a trailing newline at the end of your files.
58 \item Choose a generally accepted style of indentation,
59 then use it systematically throughout the whole
60 application. An indentation of two spaces is appropriate.
61 Avoid dangling indentation.
65 \item[cut-and-paste succeeds over copy-and-paste]
66 \emph{Never} copy-and-paste code when programming. If you
67 need the same piece of code twice, introduce a
68 reasonable auxiliary function (if there is no
69 such function, very likely you got something wrong).
70 Any copy-and-paste will turn out to be painful
71 when something has to be changed later on.
74 are a device which requires careful thinking before using
75 it. The best comment for your code should be the code itself.
76 Prefer efforts to write clear, understandable code
77 over efforts to explain nasty code.
79 \item[functional programming is based on functions]
80 Model things as abstract as appropriate. Avoid unnecessarrily
81 concrete datatype representations. For example, consider a function
82 taking some table data structure as argument and performing
83 lookups on it. Instead of taking a table, it could likewise
84 take just a lookup function. Accustom
85 your way of coding to the level of expressiveness a functional
86 programming language is giving onto you.
89 are often in the way. When there is no striking argument
90 to tuple function arguments, just write your function curried.
93 Any name should tell its purpose as exactly as possible, while
94 keeping its length to the absolutely necessary minimum. Always
95 give the same name to function arguments which have the same
96 meaning. Separate words by underscores (@{verbatim
97 int_of_string}, not @{verbatim intOfString}).\footnote{Some
98 recent tools for Emacs include special precautions to cope with
99 bumpy names in @{text "camelCase"}, e.g.\ for improved on-screen
100 readability. It is easier to abstain from using such names in the
107 section {* Thread-safe programming *}
110 Recent versions of Poly/ML (5.1 or later) support multithreaded
111 execution based on native operating system threads of the
112 underlying platform. Thus threads will actually be executed in
113 parallel on multi-core systems. A speedup-factor of approximately
114 2--4 can be expected for large well-structured Isabelle sessions,
115 where theories are organized as a graph with sufficiently many
118 Threads lack the memory protection of separate processes, but
119 operate concurrently on shared heap memory. This has the advantage
120 that results of independent computations are immediately available
121 to other threads, without requiring explicit communication,
122 reloading, or even recoding of data.
124 On the other hand, some programming guidelines need to be observed
125 in order to make unprotected parallelism work out smoothly. While
126 the ML system implementation is responsible to maintain basic
127 integrity of the representation of ML values in memory, the
128 application programmer needs to ensure that multithreaded execution
129 does not break the intended semantics.
131 \medskip \paragraph{Critical shared resources.} Actually only those
132 parts outside the purely functional world of ML are critical. In
133 particular, this covers
137 \item global references (or arrays), i.e.\ those that persist over
138 several invocations of associated operations,\footnote{This is
139 independent of the visibility of such mutable values in the toplevel
142 \item global ML bindings in the toplevel environment (@{verbatim
143 "type"}, @{verbatim val}, @{verbatim "structure"} etc.) due to
144 run-time invocation of the compiler,
146 \item direct I/O on shared channels, notably @{text "stdin"}, @{text
147 "stdout"}, @{text "stderr"}.
151 The majority of tools implemented within the Isabelle/Isar framework
152 will not require any of these critical elements: nothing special
153 needs to be observed when staying in the purely functional fragment
154 of ML. Note that output via the official Isabelle channels does not
155 even count as direct I/O in the above sense, so the operations @{ML
156 "writeln"}, @{ML "warning"}, @{ML "tracing"} etc.\ are safe.
158 \paragraph{Multithreading in Isabelle/Isar.} Our parallel execution
159 model is centered around the theory loader. Whenever a given
160 subgraph of theories needs to be updated, the system schedules a
161 number of threads to process the sources as required, while
162 observing their dependencies. Thus concurrency is limited to
163 independent nodes according to the theory import relation.
165 Any user-code that works relatively to the present background theory
166 is already safe. Contextual data may be easily stored within the
167 theory or proof context, thanks to the generic data concept of
168 Isabelle/Isar (see \secref{sec:context-data}). This greatly
169 diminishes the demand for global state information in the first
172 \medskip In rare situations where actual mutable content needs to be
173 manipulated, Isabelle provides a single \emph{critical section} that
174 may be entered while preventing any other thread from doing the
175 same. Entering the critical section without contention is very
176 fast, and several basic system operations do so frequently. This
177 also means that each thread should leave the critical section
178 quickly, otherwise parallel execution performance may degrade
181 Despite this potential bottle-neck, we refrain from fine-grained
182 locking mechanisms: the restriction to a single lock prevents
183 deadlocks without demanding further considerations in user programs.
185 \paragraph{Good conduct of impure programs.} The following
186 guidelines enable non-functional programs to participate in
191 \item Minimize global state information. Using proper theory and
192 proof context data will actually return to functional update of
193 values, without any special precautions for multithreading. Apart
194 from the fully general functors for theory and proof data (see
195 \secref{sec:context-data}) there are drop-in replacements that
196 emulate primitive references for common cases of \emph{configuration
197 options} for type @{ML_type "bool"}/@{ML_type "int"}/@{ML_type
198 "string"} (see structure @{ML_struct Config} and @{ML
199 Attrib.config_bool} etc.), and lists of theorems (see functor
200 @{ML_functor NamedThmsFun}).
202 \item Keep components with local state information
203 \emph{re-entrant}. Instead of poking initial values into (private)
204 global references, create a new state record on each invocation, and
205 pass that through any auxiliary functions of the component. The
206 state record may well contain mutable references, without requiring
207 any special synchronizations, as long as each invocation sees its
208 own copy. Occasionally, one might even return to plain functional
209 updates on non-mutable record values here.
211 \item Isolate process configuration flags. The main legitimate
212 application of global references is to configure the whole process
213 in a certain way, essentially affecting all threads. A typical
214 example is the @{ML show_types} flag, which tells the pretty printer
215 to output explicit type information for terms. Such flags usually
216 do not affect the functionality of the core system, but only the
217 view being presented to the user.
219 Occasionally, such global process flags are treated like implicit
220 arguments to certain operations, by using the @{ML setmp} combinator
221 for safe temporary assignment. Its traditional purpose was to
222 ensure proper recovery of the original value when exceptions are
223 raised in the body, now the functionality is extended to enter the
224 \emph{critical section} (with its usual potential of degrading
227 Note that recovery of plain value passing semantics via @{ML
228 setmp}~@{text "ref value"} assumes that this @{text "ref"} is
229 exclusively manipulated within the critical section. In particular,
230 any persistent global assignment of @{text "ref := value"} needs to
231 be marked critical as well, to prevent intruding another threads
232 local view, and a lost-update in the global scope, too.
234 \item Minimize global ML bindings. Processing theories occasionally
235 affects the global ML environment as well. While each ML
236 compilation unit is safe, the order of scheduling of independent
237 declarations might cause problems when composing several modules
238 later on, due to hiding of previous ML names.
240 This cannot be helped in general, because the ML toplevel lacks the
241 graph structure of the Isabelle theory space. Nevertheless, some
242 sound conventions of keeping global ML names essentially disjoint
243 (e.g.\ with the help of ML structures) prevents the problem to occur
244 in most practical situations.
248 Recall that in an open ``LCF-style'' system like Isabelle/Isar, the
249 user participates in constructing the overall environment. This
250 means that state-based facilities offered by one component will
251 require special caution later on. So minimizing critical elements,
252 by staying within the plain value-oriented view relative to theory
253 or proof contexts most of the time, will also reduce the chance of
254 mishaps occurring to end-users.
259 @{index_ML NAMED_CRITICAL: "string -> (unit -> 'a) -> 'a"} \\
260 @{index_ML CRITICAL: "(unit -> 'a) -> 'a"} \\
261 @{index_ML setmp: "'a ref -> 'a -> ('b -> 'c) -> 'b -> 'c"} \\
266 \item @{ML NAMED_CRITICAL}~@{text "name f"} evaluates @{text "f ()"}
267 while staying within the critical section of Isabelle/Isar. No
268 other thread may do so at the same time, but non-critical parallel
269 execution will continue. The @{text "name"} argument serves for
270 diagnostic purposes and might help to spot sources of congestion.
272 \item @{ML CRITICAL} is the same as @{ML NAMED_CRITICAL} with empty
275 \item @{ML setmp}~@{text "ref value f x"} evaluates @{text "f x"}
276 while staying within the critical section and having @{text "ref :=
277 value"} assigned temporarily. This recovers a value-passing
278 semantics involving global references, regardless of exceptions or
285 chapter {* Basic library functions *}
288 Beyond the proposal of the SML/NJ basis library, Isabelle comes
289 with its own library, from which selected parts are given here.
290 These should encourage a study of the Isabelle sources,
291 in particular files \emph{Pure/library.ML} and \emph{Pure/General/*.ML}.
294 section {* Linear transformations *}
298 @{index_ML "op |> ": "'a * ('a -> 'b) -> 'b"} \\
306 val thy = Theory.copy @{theory}
311 Many problems in functional programming can be thought of
312 as linear transformations, i.e.~a caluclation starts with a
313 particular value @{ML_text "x : foo"} which is then transformed
314 by application of a function @{ML_text "f : foo -> foo"},
315 continued by an application of a function @{ML_text "g : foo -> bar"},
316 and so on. As a canoncial example, take functions enriching
317 a theory by constant declararion and primitive definitions:
319 \smallskip\begin{mldecls}
320 @{ML "Sign.declare_const: Markup.property list -> bstring * typ * mixfix
321 -> theory -> term * theory"} \\
322 @{ML "Thm.add_def: bool -> bool -> bstring * term -> theory -> thm * theory"}
325 Written with naive application, an addition of constant
326 @{term bar} with type @{typ "foo \<Rightarrow> foo"} and
327 a corresponding definition @{term "bar \<equiv> \<lambda>x. x"} would look like:
329 \smallskip\begin{mldecls}
330 @{ML "(fn (t, thy) => Thm.add_def false false
331 (\"bar_def\", Logic.mk_equals (t, @{term \"%x. x\"})) thy)
332 (Sign.declare_const []
333 (\"bar\", @{typ \"foo => foo\"}, NoSyn) thy)"}
336 With increasing numbers of applications, this code gets quite
337 unreadable. Further, it is unintuitive that things are
338 written down in the opposite order as they actually ``happen''.
342 \noindent At this stage, Isabelle offers some combinators which allow
343 for more convenient notation, most notably reverse application:
345 \smallskip\begin{mldecls}
347 |> Sign.declare_const [] (\"bar\", @{typ \"foo => foo\"}, NoSyn)
348 |> (fn (t, thy) => thy
349 |> Thm.add_def false false
350 (\"bar_def\", Logic.mk_equals (t, @{term \"%x. x\"})))"}
356 @{index_ML "op |-> ": "('c * 'a) * ('c -> 'a -> 'b) -> 'b"} \\
357 @{index_ML "op |>> ": "('a * 'c) * ('a -> 'b) -> 'b * 'c"} \\
358 @{index_ML "op ||> ": "('c * 'a) * ('a -> 'b) -> 'c * 'b"} \\
359 @{index_ML "op ||>> ": "('c * 'a) * ('a -> 'd * 'b) -> ('c * 'd) * 'b"} \\
364 \noindent Usually, functions involving theory updates also return
365 side results; to use these conveniently, yet another
366 set of combinators is at hand, most notably @{ML "op |->"}
367 which allows curried access to side results:
369 \smallskip\begin{mldecls}
371 |> Sign.declare_const [] (\"bar\", @{typ \"foo => foo\"}, NoSyn)
372 |-> (fn t => Thm.add_def false false
373 (\"bar_def\", Logic.mk_equals (t, @{term \"%x. x\"})))
377 \noindent @{ML "op |>>"} allows for processing side results on their own:
379 \smallskip\begin{mldecls}
381 |> Sign.declare_const [] (\"bar\", @{typ \"foo => foo\"}, NoSyn)
382 |>> (fn t => Logic.mk_equals (t, @{term \"%x. x\"}))
383 |-> (fn def => Thm.add_def false false (\"bar_def\", def))
387 \noindent Orthogonally, @{ML "op ||>"} applies a transformation
388 in the presence of side results which are left unchanged:
390 \smallskip\begin{mldecls}
392 |> Sign.declare_const [] (\"bar\", @{typ \"foo => foo\"}, NoSyn)
393 ||> Sign.add_path \"foobar\"
394 |-> (fn t => Thm.add_def false false
395 (\"bar_def\", Logic.mk_equals (t, @{term \"%x. x\"})))
396 ||> Sign.restore_naming thy
400 \noindent @{ML "op ||>>"} accumulates side results:
402 \smallskip\begin{mldecls}
404 |> Sign.declare_const [] (\"bar\", @{typ \"foo => foo\"}, NoSyn)
405 ||>> Sign.declare_const [] (\"foobar\", @{typ \"foo => foo\"}, NoSyn)
406 |-> (fn (t1, t2) => Thm.add_def false false
407 (\"bar_def\", Logic.mk_equals (t1, t2)))
414 @{index_ML fold: "('a -> 'b -> 'b) -> 'a list -> 'b -> 'b"} \\
415 @{index_ML fold_map: "('a -> 'b -> 'c * 'b) -> 'a list -> 'b -> 'c list * 'b"} \\
420 \noindent This principles naturally lift to \emph{lists} using
421 the @{ML fold} and @{ML fold_map} combinators.
422 The first lifts a single function
423 \begin{quote}\footnotesize
424 @{ML_text "f : 'a -> 'b -> 'b"} to @{ML_text "'a list -> 'b -> 'b"}
427 \begin{quote}\footnotesize
428 @{ML_text "y |> fold f [x1, x2, ..., x_n]"} \\
429 \hspace*{2ex}@{text "\<leadsto>"} @{ML_text "y |> f x1 |> f x2 |> ... |> f x_n"}
431 The second accumulates side results in a list by lifting
433 \begin{quote}\footnotesize
434 @{ML_text "f : 'a -> 'b -> 'c * 'b"} to @{ML_text "'a list -> 'b -> 'c list * 'b"}
437 \begin{quote}\footnotesize
438 @{ML_text "y |> fold_map f [x1, x2, ..., x_n]"} \\
439 \hspace*{2ex}@{text "\<leadsto>"} @{ML_text "y |> f x1 ||>> f x2 ||>> ... ||>> f x_n"} \\
440 \hspace*{6ex}@{ML_text "||> (fn ((z1, z2), ..., z_n) => [z1, z2, ..., z_n])"}
445 \smallskip\begin{mldecls}
447 val consts = [\"foo\", \"bar\"];
450 |> fold_map (fn const => Sign.declare_const []
451 (const, @{typ \"foo => foo\"}, NoSyn)) consts
452 |>> map (fn t => Logic.mk_equals (t, @{term \"%x. x\"}))
453 |-> (fn defs => fold_map (fn def =>
454 Thm.add_def false false (\"\", def)) defs)
461 @{index_ML "op #> ": "('a -> 'b) * ('b -> 'c) -> 'a -> 'c"} \\
462 @{index_ML "op #-> ": "('a -> 'c * 'b) * ('c -> 'b -> 'd) -> 'a -> 'd"} \\
463 @{index_ML "op #>> ": "('a -> 'c * 'b) * ('c -> 'd) -> 'a -> 'd * 'b"} \\
464 @{index_ML "op ##> ": "('a -> 'c * 'b) * ('b -> 'd) -> 'a -> 'c * 'd"} \\
465 @{index_ML "op ##>> ": "('a -> 'c * 'b) * ('b -> 'e * 'd) -> 'a -> ('c * 'e) * 'd"} \\
470 \noindent All those linear combinators also exist in higher-order
471 variants which do not expect a value on the left hand side
477 @{index_ML "op ` ": "('b -> 'a) -> 'b -> 'a * 'b"} \\
478 @{index_ML tap: "('b -> 'a) -> 'b -> 'b"} \\
483 \noindent These operators allow to ``query'' a context
484 in a series of context transformations:
486 \smallskip\begin{mldecls}
488 |> tap (fn _ => writeln \"now adding constant\")
489 |> Sign.declare_const [] (\"bar\", @{typ \"foo => foo\"}, NoSyn)
490 ||>> `(fn thy => Sign.declared_const thy
491 (Sign.full_name thy \"foobar\"))
492 |-> (fn (t, b) => if b then I
493 else Sign.declare_const []
494 (\"foobar\", @{typ \"foo => foo\"}, NoSyn) #> snd)
499 section {* Options and partiality *}
503 @{index_ML is_some: "'a option -> bool"} \\
504 @{index_ML is_none: "'a option -> bool"} \\
505 @{index_ML the: "'a option -> 'a"} \\
506 @{index_ML these: "'a list option -> 'a list"} \\
507 @{index_ML the_list: "'a option -> 'a list"} \\
508 @{index_ML the_default: "'a -> 'a option -> 'a"} \\
509 @{index_ML try: "('a -> 'b) -> 'a -> 'b option"} \\
510 @{index_ML can: "('a -> 'b) -> 'a -> bool"} \\
515 Standard selector functions on @{text option}s are provided. The
516 @{ML try} and @{ML can} functions provide a convenient interface for
517 handling exceptions -- both take as arguments a function @{ML_text f}
518 together with a parameter @{ML_text x} and handle any exception during
519 the evaluation of the application of @{ML_text f} to @{ML_text x}, either
520 return a lifted result (@{ML NONE} on failure) or a boolean value
521 (@{ML false} on failure).
525 section {* Common data structures *}
527 subsection {* Lists (as set-like data structures) *}
531 @{index_ML member: "('b * 'a -> bool) -> 'a list -> 'b -> bool"} \\
532 @{index_ML insert: "('a * 'a -> bool) -> 'a -> 'a list -> 'a list"} \\
533 @{index_ML remove: "('b * 'a -> bool) -> 'b -> 'a list -> 'a list"} \\
534 @{index_ML merge: "('a * 'a -> bool) -> 'a list * 'a list -> 'a list"} \\
539 Lists are often used as set-like data structures -- set-like in
540 the sense that they support a notion of @{ML member}-ship,
541 @{ML insert}-ing and @{ML remove}-ing, but are order-sensitive.
542 This is convenient when implementing a history-like mechanism:
543 @{ML insert} adds an element \emph{to the front} of a list,
544 if not yet present; @{ML remove} removes \emph{all} occurences
545 of a particular element. Correspondingly @{ML merge} implements a
546 a merge on two lists suitable for merges of context data
547 (\secref{sec:context-theory}).
549 Functions are parametrized by an explicit equality function
550 to accomplish overloaded equality; in most cases of monomorphic
551 equality, writing @{ML "op ="} should suffice.
554 subsection {* Association lists *}
558 @{index_ML_exn AList.DUP} \\
559 @{index_ML AList.lookup: "('a * 'b -> bool) -> ('b * 'c) list -> 'a -> 'c option"} \\
560 @{index_ML AList.defined: "('a * 'b -> bool) -> ('b * 'c) list -> 'a -> bool"} \\
561 @{index_ML AList.update: "('a * 'a -> bool) -> ('a * 'b) -> ('a * 'b) list -> ('a * 'b) list"} \\
562 @{index_ML AList.default: "('a * 'a -> bool) -> ('a * 'b) -> ('a * 'b) list -> ('a * 'b) list"} \\
563 @{index_ML AList.delete: "('a * 'b -> bool) -> 'a -> ('b * 'c) list -> ('b * 'c) list"} \\
564 @{index_ML AList.map_entry: "('a * 'b -> bool) -> 'a
565 -> ('c -> 'c) -> ('b * 'c) list -> ('b * 'c) list"} \\
566 @{index_ML AList.map_default: "('a * 'a -> bool) -> 'a * 'b -> ('b -> 'b)
567 -> ('a * 'b) list -> ('a * 'b) list"} \\
568 @{index_ML AList.join: "('a * 'a -> bool) -> ('a -> 'b * 'b -> 'b) (*exception DUP*)
569 -> ('a * 'b) list * ('a * 'b) list -> ('a * 'b) list (*exception AList.DUP*)"} \\
570 @{index_ML AList.merge: "('a * 'a -> bool) -> ('b * 'b -> bool)
571 -> ('a * 'b) list * ('a * 'b) list -> ('a * 'b) list (*exception AList.DUP*)"}
576 Association lists can be seens as an extension of set-like lists:
577 on the one hand, they may be used to implement finite mappings,
578 on the other hand, they remain order-sensitive and allow for
579 multiple key-value-pair with the same key: @{ML AList.lookup}
580 returns the \emph{first} value corresponding to a particular
581 key, if present. @{ML AList.update} updates
582 the \emph{first} occurence of a particular key; if no such
583 key exists yet, the key-value-pair is added \emph{to the front}.
584 @{ML AList.delete} only deletes the \emph{first} occurence of a key.
585 @{ML AList.merge} provides an operation suitable for merges of context data
586 (\secref{sec:context-theory}), where an equality parameter on
587 values determines whether a merge should be considered a conflict.
588 A slightly generalized operation if implementend by the @{ML AList.join}
589 function which allows for explicit conflict resolution.
592 subsection {* Tables *}
596 @{index_ML_type "'a Symtab.table"} \\
597 @{index_ML_exn Symtab.DUP: string} \\
598 @{index_ML_exn Symtab.SAME} \\
599 @{index_ML_exn Symtab.UNDEF: string} \\
600 @{index_ML Symtab.empty: "'a Symtab.table"} \\
601 @{index_ML Symtab.lookup: "'a Symtab.table -> string -> 'a option"} \\
602 @{index_ML Symtab.defined: "'a Symtab.table -> string -> bool"} \\
603 @{index_ML Symtab.update: "(string * 'a) -> 'a Symtab.table -> 'a Symtab.table"} \\
604 @{index_ML Symtab.default: "string * 'a -> 'a Symtab.table -> 'a Symtab.table"} \\
605 @{index_ML Symtab.delete: "string
606 -> 'a Symtab.table -> 'a Symtab.table (*exception Symtab.UNDEF*)"} \\
607 @{index_ML Symtab.map_entry: "string -> ('a -> 'a)
608 -> 'a Symtab.table -> 'a Symtab.table"} \\
609 @{index_ML Symtab.map_default: "(string * 'a) -> ('a -> 'a)
610 -> 'a Symtab.table -> 'a Symtab.table"} \\
611 @{index_ML Symtab.join: "(string -> 'a * 'a -> 'a) (*exception Symtab.DUP/Symtab.SAME*)
612 -> 'a Symtab.table * 'a Symtab.table
613 -> 'a Symtab.table (*exception Symtab.DUP*)"} \\
614 @{index_ML Symtab.merge: "('a * 'a -> bool)
615 -> 'a Symtab.table * 'a Symtab.table
616 -> 'a Symtab.table (*exception Symtab.DUP*)"}
621 Tables are an efficient representation of finite mappings without
622 any notion of order; due to their efficiency they should be used
623 whenever such pure finite mappings are neccessary.
625 The key type of tables must be given explicitly by instantiating
626 the @{ML_functor TableFun} functor which takes the key type
627 together with its @{ML_type order}; for convience, we restrict
628 here to the @{ML_struct Symtab} instance with @{ML_type string}
631 Most table functions correspond to those of association lists.