make WWW_Find work again, now that its ML modules reside within a theory context (cf. bf5b45870110) -- patch by Rafal Kolanski;
1 % vim:nojs: tw=76 sw=4 sts=4 fo=awn fdm=marker
6 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
7 \documentclass[a4paper,draft]{article} % XXX
8 %\documentclass[a4paper,final]{article}
10 \usepackage[T1]{fontenc}
14 \bibliographystyle{abbrv} % alpha
16 \newcommand{\refsec}[1]{Section~\ref{sec:#1}}
17 \newcommand{\reffig}[1]{Figure~\ref{fig:#1}}
18 \newcommand{\reftab}[1]{Table~\ref{tab:#1}}
25 \title{Isabelle find\_theorems on the web}
26 \author{T. Bourke\thanks{NICTA}}
28 \special{!/pdfmark where
29 {pop} {userdict /pdfmark /cleartomark load put} ifelse
30 [ /Author (T. Bourke, NICTA)
31 /Title (IsabelleWWW: find_theorems
32 ($Date: 2008-10-03 16:09:02 +1000 (Fri, 03 Oct 2008) $))
33 /Subject (Web interface to find_theorems)
34 /Keywords (isabelle, sml, http, www)
38 % Title page and abstract
42 The Isabelle find\_theorems command processes queries against a theory
43 database and returns a list of matching theorems.
44 It is usually given from the Proof General or command-line interface.
45 This document describes a web server implementation.
46 Two design alternatives are presented and an overview of an implementation
50 \section{Introduction}\label{sec:intro}
52 This document describes the design and implementation of a web interface for
53 the Isabelle find\_theorems command.
54 The design requirements and their consequences are detailed in \refsec{req}.
55 Two architectures were considered: \begin{enumerate}
57 \item \emph{one process}: integrate a web server into Isabelle.
59 \item \emph{two processes}: run Isabelle as a web server module.
62 A brief evaluation of the one process architecture is presented in
64 An implementation of the two process is presented in \refsec{twoproc}.
66 \section{Design requirements}\label{sec:req}
68 The main requirements are:\begin{enumerate}
69 \item The system will allow users to search for theorems from a web browser.
70 \item It will allow queries across disparate Isabelle theories.
71 \item It will, at a minimum, handle theories from the L4.verified project.
72 \item It will run on a secure network.
73 \item There will be at most ten simultaneous users.
77 Several \emph{a priori} choices are fixed:\begin{enumerate}
78 \item The search will run against an Isabelle heap.
79 \item A single heap will be built from the theories to be searched.
80 \item The system must be persistent, for two reasons: \begin{enumerate}
81 \item Isabelle is slow to start against large heaps.
82 \item Later enhancements may require tracking states at the server.
86 \section{Evaluation: Isabelle web server}\label{sec:oneproc}
88 The advantage of integrating a web server into Isabelle is that the
89 find\_theorems service could be provided by a single process, which, in
90 principle, would simplify administration.
91 Implementing even a simple \ac{HTTP} service from scratch is an unacceptable
92 cost and fraught with potential problems and limitations.
93 It is preferable to adapt an existing system.
95 As Isabelle is written in \ac{SML}, only \ac{HTTP} services also written in
96 \ac{SML} can realistically be considered.
97 In principle Isabelle compiles on both Poly/ML and \ac{SML/NJ}, but in
98 practice the former is faster, more widely used, and better supported.
99 In particular, the L4.verified project does not build effectively on
101 This further limits the potential to adapt an existing system.
103 There are three \ac{SML} web server projects:\\
104 \centerline{\begin{tabular}{ll}
106 \url{http://www.smlserver.org}\\
108 \url{http://www.cs.cmu.edu/~fox/}\\
110 \url{http://mlton.org/Swerve}
113 Unfortunately, none of these projects is suitable.
115 The SMLServer is an Apache web server plugin.
116 It runs \ac{SML} programs that generate dynamic web pages.
117 SMLServer is based on the MLKit compiler.
118 It is unlikely that Isabelle and the l4.verified heaps could be compiled in
119 MLKit, at least not without significant effort.
121 The FoxNet web server was developed as part of the Fox project at \ac{CMU}.
122 The source is not readily available.
124 The Swerve web server is part of an unpublished book on systems programming
126 The source code is available and it is readily patched to run under the
127 latest version of SML/NJ (110.67).
128 Adapting it to Poly/ML would require non-trivial effort because it is based
129 on \ac{CML}, whose implementation on SML/NJ makes use of continuations
132 \section{Implementation: Isabelle web module}\label{sec:twoproc}
134 The description of the two process solution is divided into two subsections.
135 The first contains an overview of the architecture and web server specifics.
136 The second contains a summary of an \ac{SML} implementation of the web
139 \subsection{Architecture and web server}\label{sec:oneproc:arch}
141 \newcommand{\component}[1]{%
142 \rput(0,0){\psframe(-.8,-.6)(.8,.6)}%
143 \rput(0,0){\parbox{4.3em}{\centering{#1}}}}
147 \begin{pspicture}(-4.8,0)(3.3,4)%\psgrid
148 \newpsstyle{conn}{arrows=->}%
150 \rput(-2.2,3.3){\component{web server}}%
151 \rput( 2.2,3.3){\component{web module}}%
152 \rput(-2.2,0.7){\component{web client}}%
154 \psellipse(0,-.2)(.4,.2)%
155 \psframe[linestyle=none,fillstyle=solid,fillcolor=white]%
157 \psellipse(0,.1)(.4,.2)%
158 \psline(-.38,-.2)(-.38,.1)\psline(.38,-.2)(.38,.1)%
160 \psline[style=conn,arrows=<->](-3.0,3.3)(-3.8,3.3)%
162 \rput[rB](3.3,2.15){server}%
163 \psline[linestyle=dashed](-4.8,2)(3.3,2)%
164 \rput[rt](3.3,1.90){network}%
166 \rput[B](0.0,3.55){\psframebox*{module protocol}}%
167 \psline[style=conn](-1.4,3.4)(1.4,3.4)%
168 \psline[style=conn](1.4,3.2)(-1.4,3.2)%
170 \rput[B]{90}(-2.4,2.0){\psframebox*{\ac{HTTP}}}%
171 \psline[style=conn](-2.1,2.7)(-2.1,1.3)%
172 \psline[style=conn](-2.3,1.3)(-2.3,2.7)%
174 \caption{Overview of web module architecture\label{fig:modulearch}}
177 An overview of a simple web server architecture is presented in
179 A \emph{web client} requests a \ac{URL} from a \emph{web server} over
181 The web server processes the request by fetching static elements from its
182 local file system and communicating with \emph{web modules} to dynamically
183 generate other elements.
184 The elements are sent back across the network to the web client.
186 The web server communicates with web modules over a \emph{module protocol},
187 which dictates a means of passing requests and receiving responses.
188 There are several common module protocols.
190 In the \ac{CGI}, the web server executes processes to service dynamic
192 Request details are placed in environment variables and sent on the standard
193 input channels of processes, responses are read from the standard output
194 channels of processes and transmitted back to web clients.
196 The chief disadvantage of \ac{CGI} is that it creates a new process for
198 Fast \ac{CGI} web modules, on the other hand, run persistently in a loop.
199 They receive and respond to web server requests over a duplex socket.
200 The Fast \ac{CGI} protocol is quite complicated.
201 There are, however, alternatives like \ac{SCGI} that are easier for web
203 This is important when programming in \ac{SML} because both the number of
204 developers and available libraries are small.
206 An \ac{SCGI} web module listens on a socket for requests from a web server.
207 Requests are sent as stream of bytes.
208 The first part of the request is a null-delimited sequence of name and value
210 The second part is unparsed text sent from the web client.
211 The web module responds by sending text, usually \ac{HTTP} headers followed
212 by \ac{HTML} data, back over the socket.
213 The whole protocol can be described in two
214 pages.\footnote{\url{http://python.ca/scgi/protocol.txt}}
216 Both the Apache and Lighttpd web servers support the \ac{SCGI} protocol.
217 Lighttpd was chosen because it seemed to be small, fast, and easy to
219 Two settings are required to connect lighttpd to an \ac{SCGI} web module:
221 server.modules = ( "mod_scgi" )
222 scgi.server = ("/isabelle" => ((
223 "host" => "127.0.0.1",
225 "check-local" => "disable")))
227 In this example, the \texttt{scgi.server} entry directs the web server to
228 pass all \acp{URL} beginning with \texttt{/isabelle} to the web module
229 listening on port \texttt{64000}.
231 \subsection{Implementation in \acs{SML}}\label{sec:oneproc:impl}
234 \begin{tabular}{lp{.70\textwidth}}
235 \textbf{Module} & \textbf{Description}\\\hline
236 Mime & Rudimentary support for mime types.\\
237 HttpStatus & \ac{HTTP} header status codes.\\
238 HttpUtil & Produce \ac{HTTP} headers and parse query strings.\\
239 Xhtml & Rudimentary support for generating \ac{HTML}.\\
240 SocketUtil & Routines from The Standard ML Basis Library
241 book.\footnote{Chapter 10, Gansner and Reppy,
242 Cambridge University Press.}\\
243 \textbf{ScgiReq} & Parse \ac{SCGI} requests.\\
244 \textbf{ScgiServer} & \ac{SCGI} server event loop and dispatch.\\
245 \textbf{FindTheorems}& \ac{HTML} interface to find\_theorems.
247 \caption{Web module implementation\label{tab:moduleimpl}}
250 The find\_theorems web module is implemented in \ac{SML}.
251 It uses elements of the Pure library in Isabelle.
252 The program comprises the nine modules shown in \reftab{moduleimpl}.
253 The three most important ones are shown in bold: ScgiReq, ScgiServer, and
256 ScgiReq implements the client-side of the \ac{SCGI} protocol.
257 It maps a binary input stream into a request data type.
259 ScgiServer is a generic, multi-threaded \ac{SCGI} request dispatcher.
260 It accepts \ac{SCGI} requests on a designated socket, selects an appropriate
261 handler from an internal list, and calls the handler in a new thread.
263 FindTheorems registers a handler with ScgiServer.
264 It parses \ac{SCGI}/\ac{HTTP} requests, calls the Isabelle find\_theorems
265 function, and returns the results as \ac{HTML}.
266 The \ac{HTML} generation of individual theorems is handled by the \ac{HTML}
267 print mode of Isabelle, but the form fields and page structure were manually
270 The server is started by calling \texttt{ScgiServer.server}.
271 Scripts have been created to automate this process.
273 \subsection{Handling symbols}\label{sec:unicode}
275 Isabelle theorems are usually written in mathematical notation.
276 Internally, however, Isabelle only manipulates \acs{ASCII} strings.
277 Symbols are encoded by strings that begin with a backslash and contain a
278 symbol name between angle braces, for example, the symbol~$\longrightarrow$
279 becomes~\verb+\<longrightarrow>+.
280 The existing Thy/Html module in the Isabelle Pure theory converts many of
281 these symbols to \ac{HTML} entities.
282 Custom routines are required to convert the missing symbols to \ac{HTML}
283 \emph{numeric character references}, which are the Unicode codepoints of
284 symbols printed in decimal between \verb+&#+ and \verb+;+.
285 Further, other routines were required for converting \acs{UTF-8} encoded
286 strings sent from web browsers into Isabelle's symbol encoding.
288 Isabelle is distributed with a text file that maps Isabelle symbols to
290 A module was written to parse this file into symbol tables that map back and
291 forth between Isabelle symbols and Unicode codepoints, and also between
292 Isabelle \acs{ASCII} abbreviations (like \verb+-->+ for $\longrightarrow$)
293 and Unicode codepoints.
295 The conversion from Isabelle symbols to \ac{HTML} numeric character
296 references is handled by a new printing mode, which is based in large part
297 on the existing \ac{HTML} printing mode.
298 The new mode is used in combination with the existing \texttt{xsymbol} mode,
299 to ensure that Isabelle symbols are used instead of \acs{ASCII}
302 The conversion from \acs{UTF-8} is handled by a custom routine.
303 Additionally, there is a JavaScript routine that converts from Isabelle
304 symbol encodings to \acs{UTF-8}, so that users can conveniently view
305 manually-entered or pasted mathematical characters in the web browser
308 \section{Abbreviations}\label{sec:abbr}
310 \begin{acronym}[SML/NJ] % longest acronym here
311 \acro{ASCII}{American Standard Code for Information Interchange}
312 \acro{CGI}{Common Gateway Interface}
313 \acro{CML}{Concurrent ML}
314 \acro{CMU}{Carnegie Mellon University}
315 \acro{HTML}{Hyper Text Markup Language}
316 \acro{HTTP}{Hyper Text Transfer Protocol}
317 \acro{ML}{Meta Language}
318 \acro{SCGI}{Simple CGI}
319 \acro{SML}{Standard ML}
320 \acro{SML/NJ}{Standard ML of New Jersey}
321 \acro{URL}{Universal Resource Locator}
322 \acro{UTF-8}{8-bit Unicode Transformation Format}
323 \acro{WWW}{World Wide Web}