doc-isac/mlehnfeld/master/thesis/funproglangs_mcsystems.tex
changeset 55466 55c2d2ee3f92
parent 55404 ab97437e021a
child 55476 8e3f73e1e3a3
     1.1 --- a/doc-isac/mlehnfeld/master/thesis/funproglangs_mcsystems.tex	Thu Jun 26 10:50:27 2014 +0200
     1.2 +++ b/doc-isac/mlehnfeld/master/thesis/funproglangs_mcsystems.tex	Thu Jun 26 17:19:30 2014 +0200
     1.3 @@ -1,51 +1,198 @@
     1.4 -\chapter{Functional Programming Languages and Multicore Systems}
     1.5 +\chapter{Parallelism in Functional Programming}
     1.6  \label{cha:funproglangs_mcsystems}
     1.7 +This section presents several concepts for parallelism and concurrency which have been suggested for and implemented in functional programming languages. The central idea here is the philosophy of the declarative paradigm: The developer should not have to worry about {\em how} the parallel execution needs to be organized but at most make informed decisions on {\em what} has to be parallelized. Ideally, process management details such as process creation, resource assignment and synchronization should be taken care of by compiler and run-time system. In practice, various properties of languages such as impureness and monadic I/O, may allow for language constructs from imperative programming or even require forms of explicit synchronization.
     1.8 +
     1.9 +Many of the mechanisms presented in the following sections have been implemented in and modified for certain programming languages. It seemed appropriate to place details on these realization details and changes in the respective sections. Table \ref{tab:lang_par_overview} gives a brief overview. Note that it only contains those michanisms and sections that mention concrete implementations.
    1.10 +\begin{table}[ht]
    1.11 +\centering
    1.12 +\caption{Overview of parallelism mechanisms and referenced programming languages.}
    1.13 +\label{tab:lang_par_overview}
    1.14 +\def\rr{\rightskip=0pt plus1em \spaceskip=.3333em \xspaceskip=.5em\relax}
    1.15 +\setlength{\tabcolsep}{1ex}
    1.16 +\def\arraystretch{1.20}
    1.17 +\setlength{\tabcolsep}{1ex}
    1.18 +\small
    1.19 +\begin{tabular}{|p{0.33\textwidth}|p{0.36\textwidth}|c|}
    1.20 +  \hline
    1.21 +  \multicolumn{1}{|c}{\em Concept} &
    1.22 +  \multicolumn{1}{|c}{\em Language} &
    1.23 +  \multicolumn{1}{|c|}{\em Section} \\
    1.24 +  \hline\hline
    1.25 +  {\rr Annotations} & {\rr \textit{Clean}} & \ref{sec:annot} \\
    1.26 +  \hline
    1.27 +  {\rr Futures and Promises} & {\rr \textit{Multilisp}, \textit{Isabelle/ML}, \textit{Scala}} & \ref{sec:futurespro}, \ref{sec:actors_scala}, \ref{sec:csp} \\
    1.28 +  \hline
    1.29 +  {\rr Parallel Combinators} & {\rr \textit{Haskell}} & \ref{sec:parcomb} \\
    1.30 +  \hline
    1.31 +  {\rr Algorithmic Skeletons} & {\rr \textit{Java}} & \ref{sec:algoskel} \\
    1.32 +  \hline
    1.33 +  {\rr Data Parallelism} & {\rr \textit{Haskell}} & \ref{sec:data_parallelism} \\
    1.34 +  \hline
    1.35 +  {\rr Soft\-ware Transactional Mem\-o\-ry} & {\rr \textit{C}, \textit{Haskell}, \textit{Java}, \textit{JavaScript}, \textit{Python}, \textit{Scala}} & \ref{sec:stm} \\
    1.36 +  \hline
    1.37 +  {\rr Actors} & {\rr \textit{Erlang}, \textit{Scala}} & \ref{sec:actormodel} \\
    1.38 +  \hline
    1.39 +  {\rr Com\-mu\-ni\-cat\-ing Sequential Pro\-ces\-ses} & {\rr \textit{occam}, \textit{Limbo}, \textit{Go}} & \ref{sec:csp} \\
    1.40 +  \hline
    1.41 +\end{tabular}
    1.42 +\end{table}
    1.43 +For consistency reasons, all sample code is provided in \textit{Standard ML}, even where the demonstrated concepts are not actually available in the language.
    1.44 +%NOTES: IMPLICIT, CONTROLLED, EXPLICIT; TRANSFORMATIONAL VS. REACTIONAL;\\
    1.45 +%MISSING: EVALUATION STRATEGIES (http://hackage.haskell.org/package/parallel-3.1.0.1/docs/Control-Parallel-Strategies.html), ACTIVE DATA STRUCTURES, PIPELINE PARALLELISM
    1.46  
    1.47  \section{Implicit Parallelism}
    1.48  \label{sec:implicit_parallelism}
    1.49 -about half page
    1.50 +Due to the lack of side effects, it should be possible to compute independent subexpressions in parallel. This means that in theory, purely functional programs are parallelizable automatically by the compiler. However, the whole process is not trivial as section \ref{sec:par_funprog} already pointed out. A compiler that supports implicit parallelization must be able to perform granularity and cost analyses in order to achieve advantageous parallelizations and avoid those whose overhead would exceed their potential gain because they are too fine-grained. Additionally, compilers for lazy languages have to incorporate a strictness analysis, i.e. identify expressions which must necessarily be evaluated, such that they do not introduce an overhead by computing expressions whose results are not needed. An example for a function with a strict argument is the {\tt map} operation. If its list argument is undefined, the result of the whole function call will be undefined.
    1.51 +
    1.52 +
    1.53 +\section{Task Parallelism}
    1.54 +\label{sec:task_par}
    1.55 +Instead of automating parallelization, functional general-purpose languages such as \textit{Haskell} and \textit{Clean} provide special syntax to indicate that expressions are to be evaluated in parallel and thereby make theoretically implicit parallelism explicit.
    1.56 +
    1.57 +\subsection{Parallel {\tt let}-Expressions}
    1.58 +One proposed method are parallel {\tt let}-expressions \cite{hammer2007proposal}. {\tt let}-expressions are a common way in functional languages to allow the programmer to declare a list of values required for the evaluation of one final expression. The modified version {\tt letpar} has been suggested for the indication of a list of independent values which could potentially be computed in parallel. While this keyword is not available in \textit{Standard ML} a hypothetical use could look like the expression
    1.59 +\imlcode{~~\=le\=tpar\\
    1.60 +\>\>val x = add2 3\\
    1.61 +\>\>val y = fib 8\\
    1.62 +\>\>val z = add 7 5\\
    1.63 +\>in x + y + z end; \textrm{.}}
    1.64 +
    1.65 +\subsection{Annotations}
    1.66 +\label{sec:annot}
    1.67 +Another straight forward solution is the use of annotations. This is probably the simplest way of expressing parallelism because it does not affect the semantics of a program but only its runtime behavior. By ignoring the annotations the program can then still be compiled sequentially. This method has been implemented in the programming language \textit{Clean} \cite{nocker1991concurrent} which also allows indications of the target processing unit for the evaluation. Since eager languages evaluate a function's arguments before invoking the function itself, this mechanism, in its simple form, only works for lazy languages. A potential implementation of the parallel {\tt fib} function\footnote{see comments in appendix \ref{sec:fib-imp}} in a lazy version of \textit{Standard ML} could look somewhat like the definition
    1.68 +\imlcode{\label{alg:parannot}
    1.69 +~~\=fu\=n fib 0 = 1\\
    1.70 +\>\>| fib 1 = 1\\
    1.71 +\>\>| fib x = fib (x - 1) + par (fib (x - 2)); \textrm{.}}
    1.72 +\noindent
    1.73 +A similar construct for eager languages is slightly more involved and will be discussed in the following section.
    1.74 +
    1.75 +\subsection{Futures and Promises}
    1.76 +\label{sec:futurespro}
    1.77 +The concepts of {\em promises} and {\em futures} were first suggested in the mid 1970s \cite{baker1977incremental}. The idea is similar to annotations (see previous section). One of the first languages that adopted futures was called {\em Multilisp} \cite{halstead1985multilisp}. As the {\em Isabelle} theorem prover provides its own implementation of futures \cite{matthews2010efficient} which have been introduced to \isac{} during the parallelization process (section \ref{ssec:session-manag}), they are of special interest for this thesis. Futures are objects representing the result of an expression which may not be known initially because its computation is managed by the run-time system and may occur in parallel. {\em Implicit} futures are usually an integral part of a language and treated like ordinary references. If results are not available the first time they are requested, program execution stops until the future value has been obtained. In contrast, {\em explicit} futures require programmers to manually enforce their computations before they can access their results. {\em Isabelle}'s solution follows the latter approach and will be discussed in more detail in section \ref{sec:isafut}. The definition
    1.78 +\imlcode{\label{alg:futures}
    1.79 +~~\=fu\=n \=fi\=b \=0 = 1\\
    1.80 +\>\>| fib 1 = 1\\
    1.81 +\>\>| fib x = let\\
    1.82 +\>\>\>\>\>fun fib' () = fib (x - 2)\\
    1.83 +\>\>\>\>\>val fibf = Future.fork fib'\\
    1.84 +\>\>\>\>in fib (x - 1) + (Future.join fibf) end;}
    1.85 +\noindent
    1.86 +demonstrates the use of futures in \textit{Isabelle/ML}\footnote{see comments in appendix \ref{sec:fib-imp}}. The function \texttt{Future.fork} accepts one argument which must be a function of type \texttt{unit -> 'a}, i.e. one which accepts an empty argument ``\texttt{()}'' and returns an arbitrary type. \texttt{Future.join} requires a future value as its argument and returns the result contained in it.
    1.87 +
    1.88 +As with general evaluation strategies, futures can be determined eagerly or lazily, i.e. their computation can be started on creation of the datastructure or on demand. They are read-only placeholders. In contrast, promises are the single assignment containers which hold the future's value. Futures and promises are closely linked to Communicating Sequential Processes (section \ref{sec:csp}) and also to the Actor model (section \ref{sec:actors_scala}) and are therefore available e.g. in \textit{Scala} \cite{akka2014futures}. Although their adoption originated in functional programming, their use is not limited to this paradigm.
    1.89 +%http://en.wikipedia.org/wiki/Task_parallelism, Scala, Isabelle
    1.90 +
    1.91 +\subsection{Parallel Combinators}
    1.92 +\label{sec:parcomb}
    1.93 +\textit{Haskell}'s approach to task parallelism is the use of parallel combinators. Its run-time system manages a queue of tasks, so called {\em sparks}, whose evaluation occurs as soon as an idle CPU is detected. The language provides the keyword {\tt par}, which accepts two arguments, ``sparks'' the evaluation of the first and returns the computed result of the second. For more details see e.g. \cite{marlow2009runtime}.
    1.94 +%CAN BE EXTENDED
    1.95 +
    1.96 +\subsection{Algorithmic Skeletons}
    1.97 +\label{sec:algoskel}
    1.98 +The idea of algorthmic skeletons is simple and not limited to parallelism. Most parallel algorithms show common patterns. These patterns can be abstracted into higher-order functions. A well known example is {\em divide-and-conquer}:
    1.99 +\imlcode{~~\=fu\=n \=da\=c \=is\=\_trivial solve divide conquer x =\\
   1.100 +\>\>\>if is\_trivial x then\\
   1.101 +\>\>\>\>solve x\\
   1.102 +\>\>\>else\\
   1.103 +\>\>\>\>divide x\\
   1.104 +\>\>\>\>\>|> map (dac is\_trivial solve divide conquer)\\
   1.105 +\>\>\>\>\>|> conquer; \textrm{,}}
   1.106 +\noindent
   1.107 +where \texttt{x |> f} is syntactic sugar for \texttt{f x} and \texttt{is\_trivial} is a function with one argument and a \texttt{bool} return value. In order to parallelize the algorithm we only need to use a parallel version of {\tt map}. Other common patterns include {\em branch-and-bound} and {\em dynamic programming}. While this method is also available for imperative languages such as \textit{Java}, the fact that functional languages treat higher-order functions as first-class citizens makes them particularly suitable for the use of skeletons. Algorithmic skeletons hide the orchestration between sequential portions of code from the programmer. Sometimes the compiler can provide low-level support for the skeletons. Furthermore, multiple primitive skeletons can be composed to produce more powerful patterns. Unlike other high-level parallelism mechanisms, they are apt to cost modelling \cite{hammond2000research}.
   1.108  
   1.109  
   1.110  \section{Data Parallelism}
   1.111  \label{sec:data_parallelism}
   1.112 -about 1 page
   1.113 +% COULD BE EXTENDED (nested parallelism details, active data structures, Scala parallel collections)
   1.114 +Many operations require the application of the same function on multiple items of a data set. This can generally occur in parallel. With this kind of algorithms, the parallel portion naturally depends on the problem size. Here the same issues apply that we encountered before: The granularity often is too fine and the overhead exceeds the benefits of parallel processing. Most applications of data parallelism are based on array structures and early implementations specialized in highly parallel, high-performance and scientific computing \cite{hammond2000research}. Recent developments in GPU hardware and the arrival of general-purpose GPU programming thanks to projects such as {\em CUDA} and {\em Accelerator} \cite{tarditi2006accelerator} have fueled data parallelism research during the last few years \cite{nickolls2010gpu}. Distributed, parallel algorithms like Google's {\em MapReduce} should be mentioned here. However, we will not discuss distributed parallelism in this thesis.
   1.115  
   1.116 -
   1.117 -\section{Algorithmic Skeletons}
   1.118 -\label{sec:algoskel}
   1.119 -about 3 pages
   1.120 -
   1.121 -
   1.122 -\section{Futures and Promises}
   1.123 -\label{sec:futurespro}
   1.124 -about 2 pages
   1.125 +In 1990, Blelloch and Sabot first made a distinction between {\em flat} and {\em nested} parallelism \cite{blelloch90compiling}. Until then, data parallelism implementations had only considered sequential functions for parallel application on data sets. Nested data parallelism is significantly more complex in that it allows the application of functions which themselves are parallel. More than 15 years later, this functionality was suggested and implemented as an extension to the \textit{Haskell Glasgow Compiler} called {\em Data Parallel Haskell} \cite{peytonjones2008harnessing}.
   1.126  
   1.127  
   1.128  \section{Concurrent Functional Programming}
   1.129  \label{sec:concurrentfunprog}
   1.130 +As we saw in section \ref{sec:parallelism}, concurrency describes two or more tasks whose lifespans may overlap while parallelism is capable of exploiting multiple processing units by executing multiple tasks simultaneously. Section \ref{sec:concurresp} outlined how concurrency can be beneficial by hiding latencies and improving responsiveness. Now we want to explore concurrency models and control mechanisms in functional programming.
   1.131 +% could further be extended by process calculi
   1.132  
   1.133  \subsection{Software Transactional Memory}
   1.134 -about 2 pages
   1.135 +\label{sec:stm}
   1.136 +While hardware support for transactions had been introduced before and they were a fundamental aspect of fault tolerance in the context of database systems, software-only transactional memory was first suggested by Shavit and Touitou in 1997 \cite{shavit1997software}. Unlike {\em lock-based synchronization} STM generally is a non-blocking strategy. Transactions are sequences of read and write operations on a shared memory. They are tagged atomic blocks, processed optimistically, regardless of other threads potentially performing other alterations to the memory simultaneously. Each access to the memory is logged and once a transaction is finished, a final validation step verifies that no other thread modified any of the elements which were read during the process, i.e. the transaction's view of the memory was consistent. If there was no conflicting write access, the transaction is commited to the memory. Otherwise the whole process is repeated until no more conflicts arise. In order to be correct, transactions must be linearizable, i.e. it must be possible to execute them in a certain sequence without temporal overlaps, such that the resulting state of the shared memory is the same as if the transactions had been carried out concurrently \cite{herlihy1990linearizability}. Because of their non-blocking nature, STMs do not cause deadlocks or priority inversion. They are inherently fault tolerant to a certain extent, because transactions can be undone in the event of a timeout or an exception. Also, they are free from the tradeoffs between lock granularity and concurrency.
   1.137 +
   1.138 +Recent work on {\em commitment ordering} has helped reduce the number of transaction conflicts and thereby improve performance of STM implementations \cite{ramadan2009committing,zhang2010software}. Also, there are various STM solutions that do in fact use locking mechanisms during different phases of transaction. They can potentially reduce the number of rollbacks and consequently also make process coordination faster \cite{ennals2006software}. The significant difference between these approaches and conventional, mutex based thread synchronization mechanisms is the fact that with STM, any locking is invisible to the programmer and taken care of by the run-time system. The downside of STM is the performance overhead caused by transaction management which typically impairs the performance of programs with a low number of threads, such that they are slower than when implemented in a sequential manner. STMs, however, scale particularly well with respect to higher thread and processor numbers \cite{saha2006mcrt}.
   1.139 +
   1.140 +Several implementations for various programming languages including \textit{C}, \textit{Java}, \textit{Scala}, \textit{Python} and even \textit{JavaScript} are available. \textit{Concurrent Haskell} offers multiple language constructs built on the basis of STM to allow for composable, nested, alternative and blocking transactions as well as STM exceptions \cite{harris2005composable}.
   1.141 +%An example use of STM in StandardML could look like snippet \ref{alg:stm}.
   1.142 +%\begin{MLCode}[alg:stm]{Theoretical Software Transactional Memory use in \textit{Standard ML}}
   1.143 +%\end{MLCode}
   1.144 +
   1.145 +%CML similar approach par_conc_ml.pdf
   1.146 +%http://channel9.msdn.com/Shows/Going+Deep/Programming-in-the-Age-of-Concurrency-Software-Transactional-Memory
   1.147  
   1.148  \subsection{The Actor Model}
   1.149 -about 3 pages
   1.150 +\label{sec:actormodel}
   1.151 +In 1973, Hewitt, Bishop and Steiger suggested a modular actor formalism \cite{hewitt1973universal} which laid out the basis of what is now known as the {\em Actor model}. Several publications of the following decade formed a comprehensive Actor model theory. The two central concepts of the Actor model are {\em actors} and {\em messages} \cite{agha1985actors}. Actors are entities that can send and receive messages. Their behavior is defined by the {\bf actions} they perform on receipt of a message (e.g. create another actor) and a finite number of other actors they know about, called {\bf acquaintances}. The acquantance relation is not necessarily bidirectional, i.e. an acquantance $B$ of actor $A$ must not necessarily know actor $A$. Each actor has an immutable, unique name and a local state. Communication between actors is asynchronous. Incoming messages are processed one at a time and as a response to each message an atomic sequence of actions is performed. Since actors and messages are autonomous, encapsulated entities they are easy to reason about and can be composed into more complex systems \cite{agha1997foundation}. The order of actions affects an actor's external behavior. But because messaging occurs asynchronously, the processing order of messages is undefined. This leads to nondeterminism in computations based on the actor system. There are certain abstractions available which allow for constraints on message ordering and can therefore eliminate part of the nondeterminism \cite{karmani2009actor}. They can be summarized in four important properties:
   1.152 +\begin{enumerate}
   1.153 +\item {\bf Encapsulation.} Unlike shared memory thread models, actors are highly encapsulated. Therefore, they provide no locking or synchronization mechanisms for shared resources. Their internal states are not directly accessible by other actors. The only way for them to exchange data or influence each other is by means of messages. This limitation, however, is violated by some implementations in languages such as \textit{Scala} \cite{thurauakka}. In programming languages which permit mutable data, messages should, in theory, not transfer references to locations in a shared address space. Otherwise the receiving party could modify memory locations that are contained in another actor's state representation. However, \textit{Java}'s actor library {\em Akka} allows actors to do just that and it is the programmer's responsibility to maintain encapsulation.
   1.154 +\item {\bf Fairness.} Another important property is fairness with respect to actor sched\-ul\-ing: a message will be delivered to its target actor eventually, i.e. unless an actor dies or displays faulty behavior, it will be scheduled in a fair manner.
   1.155 +\item {\bf Location Transparency.} This property demands that an actor's unique name is independent of its location, i.e. its assigned CPU and memory can potentially belong to a remote machine and even be migrated at run time.
   1.156 +\item {\bf Mobility.} The ability for this migration can enable load balancing and improved fault tolerance. However, implementations do not necessarily satisfy these properties (e.g. \textit{Scala}). {\em Weak mobility} allows code or initial states to be moved, while {\em strong mobility} additionally supports execution state migration \cite{fuggetta1998understanding}. It has been shown that the presence of mobility can significantly improve an architecture's scalability \cite{panwar1994methodology}.
   1.157 +\end{enumerate}
   1.158 +To demonstrate the use of a theoretical actor library for \textit{Standard ML}, let us define a trivial function \texttt{greet} and execute it in the background:
   1.159 +\imlcode{~~\=fun greet name = print ("Hello " \^~name \^~"!\textbackslash{}n");\\
   1.160 +\>val greeter = Actor.create greet;\\
   1.161 +\>Actor.send greeter "John Doe";}
   1.162  
   1.163 -\subsection{Communication Sequential Processes}
   1.164 -about 1 page
   1.165 +\subsubsection{\textit{Scala}}
   1.166 +\label{sec:actors_scala}
   1.167 +\textit{Scala} is a programming language for the \textit{Java Virtual Machine} that combines concepts of object-oriented and functional programming \cite{odersky2004overview}. Its distribution comes with a toolkit for actors and message passing functionality called {\em Akka}. \textit{Scala}'s standard actors library up to version 2.9.2 \cite{scala2014migration} has now been depricated and replaced by \textit{Akka}. Its implementation is based on \textit{Erlang} like, fault tolerant actors and its execution model follows an {\em event-based} approach, thus introducing {\em inversion of control}. Compared to \textit{Scala}'s orginal actors, \textit{Akka} actors are more performant, fault tolerant and offer automatic load balancing \cite{tasharofi2013scala}. They draw on the language's functional programming features such as partial functions and, like \textit{Erlang}, pattern matching which are convenient for message passing. Actors potentially provide safer concurrency than traditional shared memory mechanisms because by design, no race conditions can occur. Also, actors are more lightweight than threads.
   1.168 +% why no race conditions!?!?!?
   1.169  
   1.170 +\subsubsection{Futures}
   1.171 +As mentioned in section \ref{sec:futurespro}, futures are related to the Actor model. \textit{Akka} provides a {\tt Future} class whose constructor accepts a block ({\tt \{...\}}) $b$ and returns immediately with a future value which is a placeholder for the computation's result. $b$ is then executed asynchronously by a thread pool managed by the run-time environment \cite{akka2014futures}. Instead of creating futures directly, \textit{Akka}'s actors have a method {\tt ?} that one can use to send a message. The method then returns a future value whose calculation can be synchronized and retrieved using {\tt Await.result}. There are further functions for composition, combination or ordering of futures as well as exception and error handling. The library also supports direct access to {\tt Promises}, the containers that complete futures.
   1.172  
   1.173 -\section{Implications on Software Design}
   1.174 -\label{sec:implications_swdesign}
   1.175 -about 3 pages
   1.176 +\subsection{Communicating Sequential Processes}
   1.177 +\label{sec:csp}
   1.178 +CSP (Communicating Sequential Processes) belong to the family of {\em process calculi} and are another model of concurrency. Process calculi are of special interest for functional programming in that they support a declarative notion of concurrency. CSP were first proposed in an acclaimed paper by Hoare in 1978 \cite{hoare1978communicating} and originally designed for reasoning about concurrent processes. Like the Actor model they utilize message passing for communication. Implementations include {\em occam} \cite{INMOS84occam}, {\em Limbo} \cite{dorward1997programming} and {\em Go} by Google \cite{pike2012go}. The main difference from the Actor model is the use of a concept named {\em channels} in CSP. Actors communicate directly with each other while CSP send and receive messages on directed channels. Another distinction is the fact that in the original CSP model, messages are delivered synchronously and instantaneously. This means that when a process sends a message to a channel, it waits until another process has received it and the receiving party blocks until there is a message available on the channel. Processes are anonymous and need to establish channels between each other in order to be able to communicate which involves a rendezvous, i.e. they need to exchange receiving and sending ends of channels to allow them to communicate. While the basic proposal suggested point-to-point channels, this can also be extended to allow multiple processes to know and use a channel's sending and/or receiving end. The developers of {\em Go} have chosen an approach that differs slightly from the original CSP model: Channels are asynchronous and buffered. They are declared to transmit a specific da\-ta\-type. Interestingly, channels are first class values. As a consequence, they can also be transmitted via channels. Using \textit{Poly/ML}'s thread library, we can utilize a supposed CSP library to achieve the same behavior we already saw with actors and invoke \texttt{greet} asynchronously:
   1.179 +\imlcode{\label{alg:csp}
   1.180 +~~\=fun greet name = print ("Hello " \^~name \^~"!\textbackslash{}n");\\
   1.181 +\>val (c\_in, c\_out) = Channel.create () : string channel;\\
   1.182 +\>fun greet\_csp () = Channel.read c\_out |> greet;\\
   1.183 +\>Thread.fork (greet\_csp, []);\\
   1.184 +\>Channel.write c\_in "John Doe";}
   1.185  
   1.186 +\subsubsection{Futures}
   1.187 +CSP, too, can simulate futures (section \ref{sec:futurespro}): A future is a channel that can transmit exactly one element and the according promise is another process that sends the result of a computation to the channel.
   1.188  
   1.189 -\section{Refactoring Functional Programs for Multicore Systems}
   1.190 +% "Squeak: a Language for Communicating with Mice"
   1.191 +% "Newsqueak: a Language for Communicating with Mice"
   1.192 +% Go concurrency: http://www.youtube.com/watch?v=3DtUzH3zoFo
   1.193 +%   slides: http://go-lang.cat-v.org/talks/slides/emerging-languages-camp-2010.pdf
   1.194 +
   1.195 +% CCS? pi-calculus?
   1.196 +% Clean, Broadcasting Sequential Processes
   1.197 +% about 1 page
   1.198 +
   1.199 +
   1.200 +\section{Refactoring for Parallelism}
   1.201  \label{sec:refac_funprogs4mcsystems}
   1.202 +Only few publications have been dedicated to refactorings for parallelism \cite{kennedy1991interactive,wloka2009refactoring,dig2011refactoring}, even less so in the context of functional programming. The {\em ParaForming} tool for \textit{Haskell} \cite{brown2012paraforming} appears to be the first attempt, followed by very recent work on \textit{Erlang} \cite{brown2013cost} and language independent approaches \cite{hammond2013paraphrase,brown2012language}. Due to the trend towards multi-, many- and even megacore machines, modern programming languages should support parallelism by design because the introduction of parallelism should not occur as an afterthought during later stages of the software development cycle. \textit{ParaForming} enforces a separation of concerns between business logic and implementation details usually taken care of by system programmers. It offers several additional refactorings for \textit{HaRe} (section \ref{sec:refacfunprogs}) to support efficient, coarse-grained task and data parallelism.
   1.203 +\begin{description}
   1.204 +\item[Introduce Task Parallelism.] This refactoring takes an expression $x$ from a function, embeds $x$ into an {\tt Eval} monad, sparks its evaluation using parallel combinators (section \ref{sec:parcomb}) and substitutes the occurrences of $x$ for the sparked computation $x'$.
   1.205 +\item[Introduce Data Parallelism] takes an expression $l$ which must be a list, replaces it by a {\tt parList}, i.e. a parallel implementation of a list supported by the standard library, and ensures the parallel versions of operations on $l$.
   1.206 +\item[Introduce Thresholding] allows for the definition of limits to the granularity of the aforementioned refactorings. Thresholding requires a designated expression $t$ and a threshold. Calls to the respective function are only evaluated in parallel if $t$ is not lower than the threshold.
   1.207 +\item[Clustering] refers to data parallelism and accepts a minimum chunk size for a parallel list. Whenever the size of a list falls below this value, the sequential versions of operations are used.
   1.208 +\end{description}
   1.209 +\noindent
   1.210 +For other refactorings, examples and a discussion of performance gains and overheads please refer to \cite{brown2012paraforming}. \textit{ParaForming} is very specific to \textit{Haskell} and its refactorings are relatively straight forward. However, they avoid common pitfalls like forgetting to substitute the use of an expression. Decisions on granularity are not made by the system, but their implementation is automated as much as possible.
   1.211  
   1.212 -\subsection{Previous Work}
   1.213 -about 4 pages
   1.214 -\subsubsection{HaRe and ParaForming in Haskell}
   1.215 +\subsubsection{Wrangler}
   1.216 +In 2013, Brown et al. documented their work on an extension for a refactoring tool for Erlang, named {\em Wrangler} \cite{brown2013cost,li2006comparative}. The provided refactorings make use of skeletons (section \ref{sec:algoskel}) and thereby facilitate informed decisions on which refactorings and skeletons to use based on cost modeling.
   1.217  
   1.218 -\subsection{Proposed Futurework}
   1.219 -about 1 page
   1.220 +% paraforming
   1.221 +% based on rewrite rules applied to abstract syntax tree (AST)
   1.222 +% refactoring is function taking a list possible rewrite rules
   1.223 +% rewrite rules have conditions for nodes which they can be applied to
   1.224  
   1.225 +% futurework: standard ml implementation
   1.226 +% "A parallel SML compiler based on algorithmic skeletons"