src/Tools/WWW_Find/doc/design.tex
author wenzelm
Mon, 11 Feb 2013 14:39:04 +0100
changeset 52053 d90218288d51
parent 36869 952b2b102a0a
permissions -rw-r--r--
make WWW_Find work again, now that its ML modules reside within a theory context (cf. bf5b45870110) -- patch by Rafal Kolanski;
     1 % vim:nojs: tw=76 sw=4 sts=4 fo=awn fdm=marker
     2 %
     3 % 20090406 T. Bourke
     4 %	Original document.
     5 %
     6 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
     7 \documentclass[a4paper,draft]{article} % XXX
     8 %\documentclass[a4paper,final]{article}
     9 % Preamble
    10 \usepackage[T1]{fontenc}
    11 \usepackage{textcomp}
    12 \usepackage{ifdraft}
    13 
    14 \bibliographystyle{abbrv} % alpha
    15 
    16 \newcommand{\refsec}[1]{Section~\ref{sec:#1}}
    17 \newcommand{\reffig}[1]{Figure~\ref{fig:#1}}
    18 \newcommand{\reftab}[1]{Table~\ref{tab:#1}}
    19 
    20 %
    21 \usepackage{acronym}
    22 \usepackage{pst-all}
    23 \usepackage{url}
    24 
    25 \title{Isabelle find\_theorems on the web}
    26 \author{T. Bourke\thanks{NICTA}}
    27 
    28 \special{!/pdfmark where
    29  {pop} {userdict /pdfmark /cleartomark load put} ifelse
    30  [ /Author   (T. Bourke, NICTA)
    31    /Title    (IsabelleWWW: find_theorems
    32 	      ($Date: 2008-10-03 16:09:02 +1000 (Fri, 03 Oct 2008) $))
    33    /Subject  (Web interface to find_theorems)
    34    /Keywords (isabelle, sml, http, www)
    35    /DOCINFO pdfmark}
    36 
    37 \begin{document}
    38 % Title page and abstract
    39 \maketitle
    40 
    41 \begin{abstract}
    42 The Isabelle find\_theorems command processes queries against a theory 
    43 database and returns a list of matching theorems.
    44 It is usually given from the Proof General or command-line interface.
    45 This document describes a web server implementation.
    46 Two design alternatives are presented and an overview of an implementation 
    47 of one is described.
    48 \end{abstract}
    49 
    50 \section{Introduction}\label{sec:intro}
    51 
    52 This document describes the design and implementation of a web interface for 
    53 the Isabelle find\_theorems command.
    54 The design requirements and their consequences are detailed in \refsec{req}.
    55 Two architectures were considered: \begin{enumerate}
    56 
    57 \item \emph{one process}: integrate a web server into Isabelle.
    58 
    59 \item \emph{two processes}: run Isabelle as a web server module.
    60 
    61 \end{enumerate}
    62 A brief evaluation of the one process architecture is presented in 
    63 \refsec{oneproc}.
    64 An implementation of the two process is presented in \refsec{twoproc}.
    65 
    66 \section{Design requirements}\label{sec:req}
    67 
    68 The main requirements are:\begin{enumerate}
    69 \item The system will allow users to search for theorems from a web browser.
    70 \item It will allow queries across disparate Isabelle theories.
    71 \item It will, at a minimum, handle theories from the L4.verified project.
    72 \item It will run on a secure network.
    73 \item There will be at most ten simultaneous users.
    74 \end{enumerate}
    75 
    76 \noindent
    77 Several \emph{a priori} choices are fixed:\begin{enumerate}
    78 \item The search will run against an Isabelle heap.
    79 \item A single heap will be built from the theories to be searched.
    80 \item The system must be persistent, for two reasons: \begin{enumerate}
    81     \item Isabelle is slow to start against large heaps.
    82     \item Later enhancements may require tracking states at the server.
    83 \end{enumerate}
    84 \end{enumerate}
    85 
    86 \section{Evaluation: Isabelle web server}\label{sec:oneproc}
    87 
    88 The advantage of integrating a web server into Isabelle is that the 
    89 find\_theorems service could be provided by a single process, which, in 
    90 principle, would simplify administration.
    91 Implementing even a simple \ac{HTTP} service from scratch is an unacceptable 
    92 cost and fraught with potential problems and limitations.
    93 It is preferable to adapt an existing system.
    94 
    95 As Isabelle is written in \ac{SML}, only \ac{HTTP} services also written in 
    96 \ac{SML} can realistically be considered.
    97 In principle Isabelle compiles on both Poly/ML and \ac{SML/NJ}, but in 
    98 practice the former is faster, more widely used, and better supported.
    99 In particular, the L4.verified project does not build effectively on 
   100 \ac{SML/NJ}.
   101 This further limits the potential to adapt an existing system.
   102 
   103 There are three \ac{SML} web server projects:\\
   104 \centerline{\begin{tabular}{ll}
   105 SMLServer &
   106     \url{http://www.smlserver.org}\\
   107 FoxNet web server &
   108     \url{http://www.cs.cmu.edu/~fox/}\\
   109 Swerve web server &
   110     \url{http://mlton.org/Swerve}
   111 \end{tabular}}
   112 
   113 Unfortunately, none of these projects is suitable.
   114 
   115 The SMLServer is an Apache web server plugin.
   116 It runs \ac{SML} programs that generate dynamic web pages.
   117 SMLServer is based on the MLKit compiler.
   118 It is unlikely that Isabelle and the l4.verified heaps could be compiled in 
   119 MLKit, at least not without significant effort.
   120 
   121 The FoxNet web server was developed as part of the Fox project at \ac{CMU}.
   122 The source is not readily available.
   123 
   124 The Swerve web server is part of an unpublished book on systems programming 
   125 in \ac{SML}.
   126 The source code is available and it is readily patched to run under the 
   127 latest version of SML/NJ (110.67).
   128 Adapting it to Poly/ML would require non-trivial effort because it is based 
   129 on \ac{CML}, whose implementation on SML/NJ makes use of continuations 
   130 (SMLofNJ.cont).
   131 
   132 \section{Implementation: Isabelle web module}\label{sec:twoproc}
   133 
   134 The description of the two process solution is divided into two subsections.
   135 The first contains an overview of the architecture and web server specifics.
   136 The second contains a summary of an \ac{SML} implementation of the web 
   137 module in Isabelle.
   138 
   139 \subsection{Architecture and web server}\label{sec:oneproc:arch}
   140 
   141 \newcommand{\component}[1]{%
   142     \rput(0,0){\psframe(-.8,-.6)(.8,.6)}%
   143     \rput(0,0){\parbox{4.3em}{\centering{#1}}}}
   144 
   145 \begin{figure}
   146 \centering%
   147 \begin{pspicture}(-4.8,0)(3.3,4)%\psgrid
   148     \newpsstyle{conn}{arrows=->}%
   149     %
   150     \rput(-2.2,3.3){\component{web server}}%
   151     \rput( 2.2,3.3){\component{web module}}%
   152     \rput(-2.2,0.7){\component{web client}}%
   153     \rput(-4.2,3.4){%
   154 	\psellipse(0,-.2)(.4,.2)%
   155 	\psframe[linestyle=none,fillstyle=solid,fillcolor=white]%
   156 		(-.4,-.2)(.4,.1)%
   157 	\psellipse(0,.1)(.4,.2)%
   158 	\psline(-.38,-.2)(-.38,.1)\psline(.38,-.2)(.38,.1)%
   159     }%
   160     \psline[style=conn,arrows=<->](-3.0,3.3)(-3.8,3.3)%
   161     %
   162     \rput[rB](3.3,2.15){server}%
   163     \psline[linestyle=dashed](-4.8,2)(3.3,2)%
   164     \rput[rt](3.3,1.90){network}%
   165     %
   166     \rput[B](0.0,3.55){\psframebox*{module protocol}}%
   167     \psline[style=conn](-1.4,3.4)(1.4,3.4)%
   168     \psline[style=conn](1.4,3.2)(-1.4,3.2)%
   169     %
   170     \rput[B]{90}(-2.4,2.0){\psframebox*{\ac{HTTP}}}%
   171     \psline[style=conn](-2.1,2.7)(-2.1,1.3)%
   172     \psline[style=conn](-2.3,1.3)(-2.3,2.7)%
   173 \end{pspicture}
   174 \caption{Overview of web module architecture\label{fig:modulearch}}
   175 \end{figure}
   176 
   177 An overview of a simple web server architecture is presented in 
   178 \reffig{modulearch}.
   179 A \emph{web client} requests a \ac{URL} from a \emph{web server} over 
   180 \ac{HTTP}.
   181 The web server processes the request by fetching static elements from its 
   182 local file system and communicating with \emph{web modules} to dynamically 
   183 generate other elements.
   184 The elements are sent back across the network to the web client.
   185 
   186 The web server communicates with web modules over a \emph{module protocol}, 
   187 which dictates a means of passing requests and receiving responses.
   188 There are several common module protocols.
   189 
   190 In the \ac{CGI}, the web server executes processes to service dynamic 
   191 \acp{URL}.
   192 Request details are placed in environment variables and sent on the standard 
   193 input channels of processes, responses are read from the standard output 
   194 channels of processes and transmitted back to web clients.
   195 
   196 The chief disadvantage of \ac{CGI} is that it creates a new process for 
   197 every request.
   198 Fast \ac{CGI} web modules, on the other hand, run persistently in a loop.
   199 They receive and respond to web server requests over a duplex socket.
   200 The Fast \ac{CGI} protocol is quite complicated.
   201 There are, however, alternatives like \ac{SCGI} that are easier for web 
   202 modules to support.
   203 This is important when programming in \ac{SML} because both the number of 
   204 developers and available libraries are small.
   205 
   206 An \ac{SCGI} web module listens on a socket for requests from a web server.
   207 Requests are sent as stream of bytes.
   208 The first part of the request is a null-delimited sequence of name and value 
   209 pairs.
   210 The second part is unparsed text sent from the web client.
   211 The web module responds by sending text, usually \ac{HTTP} headers followed 
   212 by \ac{HTML} data, back over the socket.
   213 The whole protocol can be described in two 
   214 pages.\footnote{\url{http://python.ca/scgi/protocol.txt}}
   215 
   216 Both the Apache and Lighttpd web servers support the \ac{SCGI} protocol.
   217 Lighttpd was chosen because it seemed to be small, fast, and easy to 
   218 configure.
   219 Two settings are required to connect lighttpd to an \ac{SCGI} web module: 
   220 \begin{verbatim}
   221 server.modules = ( "mod_scgi" )
   222 scgi.server = ("/isabelle" => ((
   223 		  "host" => "127.0.0.1",
   224 		  "port" => 64000,
   225 		  "check-local" => "disable")))
   226 \end{verbatim}
   227 In this example, the \texttt{scgi.server} entry directs the web server to 
   228 pass all \acp{URL} beginning with \texttt{/isabelle} to the web module 
   229 listening on port \texttt{64000}.
   230 
   231 \subsection{Implementation in \acs{SML}}\label{sec:oneproc:impl}
   232 
   233 \begin{table}
   234 \begin{tabular}{lp{.70\textwidth}}
   235 \textbf{Module}	     &	\textbf{Description}\\\hline
   236 Mime		     &	Rudimentary support for mime types.\\
   237 HttpStatus	     &	\ac{HTTP} header status codes.\\
   238 HttpUtil	     &	Produce \ac{HTTP} headers and parse query strings.\\
   239 Xhtml		     &	Rudimentary support for generating \ac{HTML}.\\
   240 SocketUtil	     &	Routines from The Standard ML Basis Library
   241 			book.\footnote{Chapter 10, Gansner and Reppy, 
   242 			Cambridge University Press.}\\
   243 \textbf{ScgiReq}     &	Parse \ac{SCGI} requests.\\
   244 \textbf{ScgiServer}  &	\ac{SCGI} server event loop and dispatch.\\
   245 \textbf{FindTheorems}&	\ac{HTML} interface to find\_theorems.
   246 \end{tabular}
   247 \caption{Web module implementation\label{tab:moduleimpl}}
   248 \end{table}
   249 
   250 The find\_theorems web module is implemented in \ac{SML}.
   251 It uses elements of the Pure library in Isabelle.
   252 The program comprises the nine modules shown in \reftab{moduleimpl}.
   253 The three most important ones are shown in bold: ScgiReq, ScgiServer, and 
   254 FindTheorems.
   255 
   256 ScgiReq implements the client-side of the \ac{SCGI} protocol.
   257 It maps a binary input stream into a request data type.
   258 
   259 ScgiServer is a generic, multi-threaded \ac{SCGI} request dispatcher.
   260 It accepts \ac{SCGI} requests on a designated socket, selects an appropriate 
   261 handler from an internal list, and calls the handler in a new thread.
   262 
   263 FindTheorems registers a handler with ScgiServer.
   264 It parses \ac{SCGI}/\ac{HTTP} requests, calls the Isabelle find\_theorems 
   265 function, and returns the results as \ac{HTML}.
   266 The \ac{HTML} generation of individual theorems is handled by the \ac{HTML} 
   267 print mode of Isabelle, but the form fields and page structure were manually 
   268 implemented.
   269 
   270 The server is started by calling \texttt{ScgiServer.server}.
   271 Scripts have been created to automate this process.
   272 
   273 \subsection{Handling symbols}\label{sec:unicode}
   274 
   275 Isabelle theorems are usually written in mathematical notation.
   276 Internally, however, Isabelle only manipulates \acs{ASCII} strings.
   277 Symbols are encoded by strings that begin with a backslash and contain a 
   278 symbol name between angle braces, for example, the symbol~$\longrightarrow$ 
   279 becomes~\verb+\<longrightarrow>+.
   280 The existing Thy/Html module in the Isabelle Pure theory converts many of 
   281 these symbols to \ac{HTML} entities.
   282 Custom routines are required to convert the missing symbols to \ac{HTML} 
   283 \emph{numeric character references}, which are the Unicode codepoints of 
   284 symbols printed in decimal between \verb+&#+ and \verb+;+.
   285 Further, other routines were required for converting \acs{UTF-8} encoded 
   286 strings sent from web browsers into Isabelle's symbol encoding.
   287 
   288 Isabelle is distributed with a text file that maps Isabelle symbols to 
   289 Unicode codepoints.
   290 A module was written to parse this file into symbol tables that map back and 
   291 forth between Isabelle symbols and Unicode codepoints, and also between 
   292 Isabelle \acs{ASCII} abbreviations (like \verb+-->+ for $\longrightarrow$) 
   293 and Unicode codepoints.
   294 
   295 The conversion from Isabelle symbols to \ac{HTML} numeric character 
   296 references is handled by a new printing mode, which is based in large part 
   297 on the existing \ac{HTML} printing mode.
   298 The new mode is used in combination with the existing \texttt{xsymbol} mode, 
   299 to ensure that Isabelle symbols are used instead of \acs{ASCII} 
   300 abbreviations.
   301 
   302 The conversion from \acs{UTF-8} is handled by a custom routine.
   303 Additionally, there is a JavaScript routine that converts from Isabelle 
   304 symbol encodings to \acs{UTF-8}, so that users can conveniently view 
   305 manually-entered or pasted mathematical characters in the web browser 
   306 interface.
   307 
   308 \section{Abbreviations}\label{sec:abbr}
   309 
   310 \begin{acronym}[SML/NJ] % longest acronym here
   311     \acro{ASCII}{American Standard Code for Information Interchange}
   312     \acro{CGI}{Common Gateway Interface}
   313     \acro{CML}{Concurrent ML}
   314     \acro{CMU}{Carnegie Mellon University}
   315     \acro{HTML}{Hyper Text Markup Language}
   316     \acro{HTTP}{Hyper Text Transfer Protocol}
   317     \acro{ML}{Meta Language}
   318     \acro{SCGI}{Simple CGI}
   319     \acro{SML}{Standard ML}
   320     \acro{SML/NJ}{Standard ML of New Jersey}
   321     \acro{URL}{Universal Resource Locator}
   322     \acro{UTF-8}{8-bit Unicode Transformation Format}
   323     \acro{WWW}{World Wide Web}
   324 \end{acronym}
   325 
   326 \end{document}