src/Tools/WWW_Find/doc/design.tex
author wenzelm
Mon, 11 Feb 2013 14:39:04 +0100
changeset 52053 d90218288d51
parent 36869 952b2b102a0a
permissions -rw-r--r--
make WWW_Find work again, now that its ML modules reside within a theory context (cf. bf5b45870110) -- patch by Rafal Kolanski;
kleing@33817
     1
% vim:nojs: tw=76 sw=4 sts=4 fo=awn fdm=marker
kleing@33817
     2
%
kleing@33817
     3
% 20090406 T. Bourke
kleing@33817
     4
%	Original document.
kleing@33817
     5
%
kleing@33817
     6
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
kleing@33817
     7
\documentclass[a4paper,draft]{article} % XXX
kleing@33817
     8
%\documentclass[a4paper,final]{article}
kleing@33817
     9
% Preamble
kleing@33817
    10
\usepackage[T1]{fontenc}
kleing@33817
    11
\usepackage{textcomp}
kleing@33817
    12
\usepackage{ifdraft}
kleing@33817
    13
kleing@33817
    14
\bibliographystyle{abbrv} % alpha
kleing@33817
    15
kleing@33817
    16
\newcommand{\refsec}[1]{Section~\ref{sec:#1}}
kleing@33817
    17
\newcommand{\reffig}[1]{Figure~\ref{fig:#1}}
kleing@33817
    18
\newcommand{\reftab}[1]{Table~\ref{tab:#1}}
kleing@33817
    19
kleing@33817
    20
%
kleing@33817
    21
\usepackage{acronym}
kleing@33817
    22
\usepackage{pst-all}
kleing@33817
    23
\usepackage{url}
kleing@33817
    24
kleing@33817
    25
\title{Isabelle find\_theorems on the web}
kleing@33817
    26
\author{T. Bourke\thanks{NICTA}}
kleing@33817
    27
kleing@33817
    28
\special{!/pdfmark where
kleing@33817
    29
 {pop} {userdict /pdfmark /cleartomark load put} ifelse
kleing@33817
    30
 [ /Author   (T. Bourke, NICTA)
kleing@33817
    31
   /Title    (IsabelleWWW: find_theorems
kleing@33817
    32
	      ($Date: 2008-10-03 16:09:02 +1000 (Fri, 03 Oct 2008) $))
kleing@33817
    33
   /Subject  (Web interface to find_theorems)
kleing@33817
    34
   /Keywords (isabelle, sml, http, www)
kleing@33817
    35
   /DOCINFO pdfmark}
kleing@33817
    36
kleing@33817
    37
\begin{document}
kleing@33817
    38
% Title page and abstract
kleing@33817
    39
\maketitle
kleing@33817
    40
kleing@33817
    41
\begin{abstract}
kleing@33817
    42
The Isabelle find\_theorems command processes queries against a theory 
kleing@33817
    43
database and returns a list of matching theorems.
kleing@33817
    44
It is usually given from the Proof General or command-line interface.
kleing@33817
    45
This document describes a web server implementation.
kleing@33817
    46
Two design alternatives are presented and an overview of an implementation 
kleing@33817
    47
of one is described.
kleing@33817
    48
\end{abstract}
kleing@33817
    49
kleing@33817
    50
\section{Introduction}\label{sec:intro}
kleing@33817
    51
kleing@33817
    52
This document describes the design and implementation of a web interface for 
kleing@33817
    53
the Isabelle find\_theorems command.
kleing@33817
    54
The design requirements and their consequences are detailed in \refsec{req}.
kleing@33817
    55
Two architectures were considered: \begin{enumerate}
kleing@33817
    56
kleing@33817
    57
\item \emph{one process}: integrate a web server into Isabelle.
kleing@33817
    58
kleing@33817
    59
\item \emph{two processes}: run Isabelle as a web server module.
kleing@33817
    60
kleing@33817
    61
\end{enumerate}
kleing@33817
    62
A brief evaluation of the one process architecture is presented in 
kleing@33817
    63
\refsec{oneproc}.
kleing@33817
    64
An implementation of the two process is presented in \refsec{twoproc}.
kleing@33817
    65
kleing@33817
    66
\section{Design requirements}\label{sec:req}
kleing@33817
    67
kleing@33817
    68
The main requirements are:\begin{enumerate}
kleing@33817
    69
\item The system will allow users to search for theorems from a web browser.
kleing@33817
    70
\item It will allow queries across disparate Isabelle theories.
kleing@33817
    71
\item It will, at a minimum, handle theories from the L4.verified project.
kleing@33817
    72
\item It will run on a secure network.
kleing@33817
    73
\item There will be at most ten simultaneous users.
kleing@33817
    74
\end{enumerate}
kleing@33817
    75
kleing@33817
    76
\noindent
kleing@33817
    77
Several \emph{a priori} choices are fixed:\begin{enumerate}
kleing@33817
    78
\item The search will run against an Isabelle heap.
kleing@33817
    79
\item A single heap will be built from the theories to be searched.
kleing@33817
    80
\item The system must be persistent, for two reasons: \begin{enumerate}
kleing@33817
    81
    \item Isabelle is slow to start against large heaps.
kleing@33817
    82
    \item Later enhancements may require tracking states at the server.
kleing@33817
    83
\end{enumerate}
kleing@33817
    84
\end{enumerate}
kleing@33817
    85
kleing@33817
    86
\section{Evaluation: Isabelle web server}\label{sec:oneproc}
kleing@33817
    87
kleing@33817
    88
The advantage of integrating a web server into Isabelle is that the 
kleing@33817
    89
find\_theorems service could be provided by a single process, which, in 
kleing@33817
    90
principle, would simplify administration.
kleing@33817
    91
Implementing even a simple \ac{HTTP} service from scratch is an unacceptable 
kleing@33817
    92
cost and fraught with potential problems and limitations.
kleing@33817
    93
It is preferable to adapt an existing system.
kleing@33817
    94
kleing@33817
    95
As Isabelle is written in \ac{SML}, only \ac{HTTP} services also written in 
kleing@33817
    96
\ac{SML} can realistically be considered.
kleing@33817
    97
In principle Isabelle compiles on both Poly/ML and \ac{SML/NJ}, but in 
kleing@33817
    98
practice the former is faster, more widely used, and better supported.
kleing@33817
    99
In particular, the L4.verified project does not build effectively on 
kleing@33817
   100
\ac{SML/NJ}.
kleing@33817
   101
This further limits the potential to adapt an existing system.
kleing@33817
   102
kleing@33817
   103
There are three \ac{SML} web server projects:\\
kleing@33817
   104
\centerline{\begin{tabular}{ll}
kleing@33817
   105
SMLServer &
kleing@33817
   106
    \url{http://www.smlserver.org}\\
kleing@33817
   107
FoxNet web server &
kleing@33817
   108
    \url{http://www.cs.cmu.edu/~fox/}\\
kleing@33817
   109
Swerve web server &
kleing@33817
   110
    \url{http://mlton.org/Swerve}
kleing@33817
   111
\end{tabular}}
kleing@33817
   112
kleing@33817
   113
Unfortunately, none of these projects is suitable.
kleing@33817
   114
kleing@33817
   115
The SMLServer is an Apache web server plugin.
kleing@33817
   116
It runs \ac{SML} programs that generate dynamic web pages.
kleing@33817
   117
SMLServer is based on the MLKit compiler.
kleing@33817
   118
It is unlikely that Isabelle and the l4.verified heaps could be compiled in 
kleing@33817
   119
MLKit, at least not without significant effort.
kleing@33817
   120
kleing@33817
   121
The FoxNet web server was developed as part of the Fox project at \ac{CMU}.
kleing@33817
   122
The source is not readily available.
kleing@33817
   123
kleing@33817
   124
The Swerve web server is part of an unpublished book on systems programming 
kleing@33817
   125
in \ac{SML}.
kleing@33817
   126
The source code is available and it is readily patched to run under the 
kleing@33817
   127
latest version of SML/NJ (110.67).
kleing@33817
   128
Adapting it to Poly/ML would require non-trivial effort because it is based 
kleing@33817
   129
on \ac{CML}, whose implementation on SML/NJ makes use of continuations 
kleing@33817
   130
(SMLofNJ.cont).
kleing@33817
   131
kleing@33817
   132
\section{Implementation: Isabelle web module}\label{sec:twoproc}
kleing@33817
   133
kleing@33817
   134
The description of the two process solution is divided into two subsections.
kleing@33817
   135
The first contains an overview of the architecture and web server specifics.
kleing@33817
   136
The second contains a summary of an \ac{SML} implementation of the web 
kleing@33817
   137
module in Isabelle.
kleing@33817
   138
kleing@33817
   139
\subsection{Architecture and web server}\label{sec:oneproc:arch}
kleing@33817
   140
kleing@33817
   141
\newcommand{\component}[1]{%
kleing@33817
   142
    \rput(0,0){\psframe(-.8,-.6)(.8,.6)}%
kleing@33817
   143
    \rput(0,0){\parbox{4.3em}{\centering{#1}}}}
kleing@33817
   144
kleing@33817
   145
\begin{figure}
kleing@33817
   146
\centering%
kleing@33817
   147
\begin{pspicture}(-4.8,0)(3.3,4)%\psgrid
kleing@33817
   148
    \newpsstyle{conn}{arrows=->}%
kleing@33817
   149
    %
kleing@33817
   150
    \rput(-2.2,3.3){\component{web server}}%
kleing@33817
   151
    \rput( 2.2,3.3){\component{web module}}%
kleing@33817
   152
    \rput(-2.2,0.7){\component{web client}}%
kleing@33817
   153
    \rput(-4.2,3.4){%
kleing@33817
   154
	\psellipse(0,-.2)(.4,.2)%
kleing@33817
   155
	\psframe[linestyle=none,fillstyle=solid,fillcolor=white]%
kleing@33817
   156
		(-.4,-.2)(.4,.1)%
kleing@33817
   157
	\psellipse(0,.1)(.4,.2)%
kleing@33817
   158
	\psline(-.38,-.2)(-.38,.1)\psline(.38,-.2)(.38,.1)%
kleing@33817
   159
    }%
kleing@33817
   160
    \psline[style=conn,arrows=<->](-3.0,3.3)(-3.8,3.3)%
kleing@33817
   161
    %
kleing@33817
   162
    \rput[rB](3.3,2.15){server}%
kleing@33817
   163
    \psline[linestyle=dashed](-4.8,2)(3.3,2)%
kleing@33817
   164
    \rput[rt](3.3,1.90){network}%
kleing@33817
   165
    %
kleing@33817
   166
    \rput[B](0.0,3.55){\psframebox*{module protocol}}%
kleing@33817
   167
    \psline[style=conn](-1.4,3.4)(1.4,3.4)%
kleing@33817
   168
    \psline[style=conn](1.4,3.2)(-1.4,3.2)%
kleing@33817
   169
    %
kleing@33817
   170
    \rput[B]{90}(-2.4,2.0){\psframebox*{\ac{HTTP}}}%
kleing@33817
   171
    \psline[style=conn](-2.1,2.7)(-2.1,1.3)%
kleing@33817
   172
    \psline[style=conn](-2.3,1.3)(-2.3,2.7)%
kleing@33817
   173
\end{pspicture}
kleing@33817
   174
\caption{Overview of web module architecture\label{fig:modulearch}}
kleing@33817
   175
\end{figure}
kleing@33817
   176
kleing@33817
   177
An overview of a simple web server architecture is presented in 
kleing@33817
   178
\reffig{modulearch}.
kleing@33817
   179
A \emph{web client} requests a \ac{URL} from a \emph{web server} over 
kleing@33817
   180
\ac{HTTP}.
kleing@33817
   181
The web server processes the request by fetching static elements from its 
kleing@33817
   182
local file system and communicating with \emph{web modules} to dynamically 
kleing@33817
   183
generate other elements.
kleing@33817
   184
The elements are sent back across the network to the web client.
kleing@33817
   185
kleing@33817
   186
The web server communicates with web modules over a \emph{module protocol}, 
kleing@33817
   187
which dictates a means of passing requests and receiving responses.
kleing@33817
   188
There are several common module protocols.
kleing@33817
   189
kleing@33817
   190
In the \ac{CGI}, the web server executes processes to service dynamic 
kleing@33817
   191
\acp{URL}.
kleing@33817
   192
Request details are placed in environment variables and sent on the standard 
kleing@33817
   193
input channels of processes, responses are read from the standard output 
kleing@33817
   194
channels of processes and transmitted back to web clients.
kleing@33817
   195
kleing@33817
   196
The chief disadvantage of \ac{CGI} is that it creates a new process for 
kleing@33817
   197
every request.
kleing@33817
   198
Fast \ac{CGI} web modules, on the other hand, run persistently in a loop.
kleing@33817
   199
They receive and respond to web server requests over a duplex socket.
kleing@33817
   200
The Fast \ac{CGI} protocol is quite complicated.
kleing@33817
   201
There are, however, alternatives like \ac{SCGI} that are easier for web 
kleing@33817
   202
modules to support.
kleing@33817
   203
This is important when programming in \ac{SML} because both the number of 
kleing@33817
   204
developers and available libraries are small.
kleing@33817
   205
kleing@33817
   206
An \ac{SCGI} web module listens on a socket for requests from a web server.
kleing@33817
   207
Requests are sent as stream of bytes.
kleing@33817
   208
The first part of the request is a null-delimited sequence of name and value 
kleing@33817
   209
pairs.
kleing@33817
   210
The second part is unparsed text sent from the web client.
kleing@33817
   211
The web module responds by sending text, usually \ac{HTTP} headers followed 
kleing@33817
   212
by \ac{HTML} data, back over the socket.
kleing@33817
   213
The whole protocol can be described in two 
kleing@33817
   214
pages.\footnote{\url{http://python.ca/scgi/protocol.txt}}
kleing@33817
   215
kleing@33817
   216
Both the Apache and Lighttpd web servers support the \ac{SCGI} protocol.
kleing@33817
   217
Lighttpd was chosen because it seemed to be small, fast, and easy to 
kleing@33817
   218
configure.
kleing@33817
   219
Two settings are required to connect lighttpd to an \ac{SCGI} web module: 
kleing@33817
   220
\begin{verbatim}
kleing@33817
   221
server.modules = ( "mod_scgi" )
kleing@33817
   222
scgi.server = ("/isabelle" => ((
kleing@33817
   223
		  "host" => "127.0.0.1",
kleing@33817
   224
		  "port" => 64000,
kleing@33817
   225
		  "check-local" => "disable")))
kleing@33817
   226
\end{verbatim}
kleing@33817
   227
In this example, the \texttt{scgi.server} entry directs the web server to 
kleing@33817
   228
pass all \acp{URL} beginning with \texttt{/isabelle} to the web module 
kleing@33817
   229
listening on port \texttt{64000}.
kleing@33817
   230
kleing@33817
   231
\subsection{Implementation in \acs{SML}}\label{sec:oneproc:impl}
kleing@33817
   232
kleing@33817
   233
\begin{table}
kleing@33817
   234
\begin{tabular}{lp{.70\textwidth}}
kleing@33817
   235
\textbf{Module}	     &	\textbf{Description}\\\hline
kleing@33817
   236
Mime		     &	Rudimentary support for mime types.\\
kleing@33817
   237
HttpStatus	     &	\ac{HTTP} header status codes.\\
kleing@33817
   238
HttpUtil	     &	Produce \ac{HTTP} headers and parse query strings.\\
kleing@33817
   239
Xhtml		     &	Rudimentary support for generating \ac{HTML}.\\
kleing@33817
   240
SocketUtil	     &	Routines from The Standard ML Basis Library
kleing@33817
   241
			book.\footnote{Chapter 10, Gansner and Reppy, 
kleing@33817
   242
			Cambridge University Press.}\\
kleing@33817
   243
\textbf{ScgiReq}     &	Parse \ac{SCGI} requests.\\
kleing@33817
   244
\textbf{ScgiServer}  &	\ac{SCGI} server event loop and dispatch.\\
kleing@33817
   245
\textbf{FindTheorems}&	\ac{HTML} interface to find\_theorems.
kleing@33817
   246
\end{tabular}
kleing@33817
   247
\caption{Web module implementation\label{tab:moduleimpl}}
kleing@33817
   248
\end{table}
kleing@33817
   249
kleing@33817
   250
The find\_theorems web module is implemented in \ac{SML}.
kleing@33817
   251
It uses elements of the Pure library in Isabelle.
kleing@33817
   252
The program comprises the nine modules shown in \reftab{moduleimpl}.
kleing@33817
   253
The three most important ones are shown in bold: ScgiReq, ScgiServer, and 
kleing@33817
   254
FindTheorems.
kleing@33817
   255
kleing@33817
   256
ScgiReq implements the client-side of the \ac{SCGI} protocol.
kleing@33817
   257
It maps a binary input stream into a request data type.
kleing@33817
   258
kleing@33817
   259
ScgiServer is a generic, multi-threaded \ac{SCGI} request dispatcher.
kleing@33817
   260
It accepts \ac{SCGI} requests on a designated socket, selects an appropriate 
kleing@33817
   261
handler from an internal list, and calls the handler in a new thread.
kleing@33817
   262
kleing@33817
   263
FindTheorems registers a handler with ScgiServer.
kleing@33817
   264
It parses \ac{SCGI}/\ac{HTTP} requests, calls the Isabelle find\_theorems 
kleing@33817
   265
function, and returns the results as \ac{HTML}.
kleing@33817
   266
The \ac{HTML} generation of individual theorems is handled by the \ac{HTML} 
kleing@33817
   267
print mode of Isabelle, but the form fields and page structure were manually 
kleing@33817
   268
implemented.
kleing@33817
   269
kleing@33817
   270
The server is started by calling \texttt{ScgiServer.server}.
kleing@33817
   271
Scripts have been created to automate this process.
kleing@33817
   272
kleing@33817
   273
\subsection{Handling symbols}\label{sec:unicode}
kleing@33817
   274
kleing@33817
   275
Isabelle theorems are usually written in mathematical notation.
kleing@33817
   276
Internally, however, Isabelle only manipulates \acs{ASCII} strings.
kleing@33817
   277
Symbols are encoded by strings that begin with a backslash and contain a 
kleing@33817
   278
symbol name between angle braces, for example, the symbol~$\longrightarrow$ 
kleing@33817
   279
becomes~\verb+\<longrightarrow>+.
kleing@33817
   280
The existing Thy/Html module in the Isabelle Pure theory converts many of 
kleing@33817
   281
these symbols to \ac{HTML} entities.
kleing@33817
   282
Custom routines are required to convert the missing symbols to \ac{HTML} 
kleing@33817
   283
\emph{numeric character references}, which are the Unicode codepoints of 
kleing@33817
   284
symbols printed in decimal between \verb+&#+ and \verb+;+.
kleing@33817
   285
Further, other routines were required for converting \acs{UTF-8} encoded 
kleing@33817
   286
strings sent from web browsers into Isabelle's symbol encoding.
kleing@33817
   287
kleing@33817
   288
Isabelle is distributed with a text file that maps Isabelle symbols to 
kleing@33817
   289
Unicode codepoints.
kleing@33817
   290
A module was written to parse this file into symbol tables that map back and 
kleing@33817
   291
forth between Isabelle symbols and Unicode codepoints, and also between 
kleing@33817
   292
Isabelle \acs{ASCII} abbreviations (like \verb+-->+ for $\longrightarrow$) 
kleing@33817
   293
and Unicode codepoints.
kleing@33817
   294
kleing@33817
   295
The conversion from Isabelle symbols to \ac{HTML} numeric character 
kleing@33817
   296
references is handled by a new printing mode, which is based in large part 
kleing@33817
   297
on the existing \ac{HTML} printing mode.
kleing@33817
   298
The new mode is used in combination with the existing \texttt{xsymbol} mode, 
kleing@33817
   299
to ensure that Isabelle symbols are used instead of \acs{ASCII} 
kleing@33817
   300
abbreviations.
kleing@33817
   301
kleing@33817
   302
The conversion from \acs{UTF-8} is handled by a custom routine.
kleing@33817
   303
Additionally, there is a JavaScript routine that converts from Isabelle 
kleing@33817
   304
symbol encodings to \acs{UTF-8}, so that users can conveniently view 
kleing@33817
   305
manually-entered or pasted mathematical characters in the web browser 
kleing@33817
   306
interface.
kleing@33817
   307
kleing@33817
   308
\section{Abbreviations}\label{sec:abbr}
kleing@33817
   309
kleing@33817
   310
\begin{acronym}[SML/NJ] % longest acronym here
kleing@33817
   311
    \acro{ASCII}{American Standard Code for Information Interchange}
kleing@33817
   312
    \acro{CGI}{Common Gateway Interface}
kleing@33817
   313
    \acro{CML}{Concurrent ML}
kleing@33817
   314
    \acro{CMU}{Carnegie Mellon University}
kleing@33817
   315
    \acro{HTML}{Hyper Text Markup Language}
kleing@33817
   316
    \acro{HTTP}{Hyper Text Transfer Protocol}
kleing@33817
   317
    \acro{ML}{Meta Language}
kleing@33817
   318
    \acro{SCGI}{Simple CGI}
kleing@33817
   319
    \acro{SML}{Standard ML}
kleing@33817
   320
    \acro{SML/NJ}{Standard ML of New Jersey}
kleing@33817
   321
    \acro{URL}{Universal Resource Locator}
kleing@33817
   322
    \acro{UTF-8}{8-bit Unicode Transformation Format}
kleing@33817
   323
    \acro{WWW}{World Wide Web}
kleing@33817
   324
\end{acronym}
kleing@33817
   325
kleing@33817
   326
\end{document}