doc-isac/mlehnfeld/master/thesis/funproglangs_mcsystems.tex
author Walther Neuper <neuper@ist.tugraz.at>
Wed, 23 Jul 2014 14:32:19 +0200
changeset 55476 8e3f73e1e3a3
parent 55466 55c2d2ee3f92
child 60710 21ae85b023bb
permissions -rw-r--r--
final docu of mlehnfeld

copy from https://hg.risc.uni-linz.ac.at/wneuper/mlehnfeld
changeset 97ab9b31c84d: Added tag fixed final for changeset 2faac24320f2
author Mathias Lehnfeld <s1210629013@students.fh-hagenberg.at>
Sun, 20 Jul 2014 16:32:03 +0200 (2 days ago)
changeset 38 97ab9b31c84d
s1210629013@55466
     1
\chapter{Parallelism in Functional Programming}
s1210629013@55404
     2
\label{cha:funproglangs_mcsystems}
s1210629013@55466
     3
This section presents several concepts for parallelism and concurrency which have been suggested for and implemented in functional programming languages. The central idea here is the philosophy of the declarative paradigm: The developer should not have to worry about {\em how} the parallel execution needs to be organized but at most make informed decisions on {\em what} has to be parallelized. Ideally, process management details such as process creation, resource assignment and synchronization should be taken care of by compiler and run-time system. In practice, various properties of languages such as impureness and monadic I/O, may allow for language constructs from imperative programming or even require forms of explicit synchronization.
s1210629013@55466
     4
neuper@55476
     5
Many of the mechanisms presented in the following sections have been implemented in and modified for certain programming languages. It seemed appropriate to place details on these realization details and changes in the respective sections. Table \ref{tab:lang_par_overview} gives a brief overview. Note that it only contains those mechanisms and sections that mention concrete implementations.
s1210629013@55466
     6
\begin{table}[ht]
s1210629013@55466
     7
\centering
s1210629013@55466
     8
\caption{Overview of parallelism mechanisms and referenced programming languages.}
s1210629013@55466
     9
\label{tab:lang_par_overview}
s1210629013@55466
    10
\def\rr{\rightskip=0pt plus1em \spaceskip=.3333em \xspaceskip=.5em\relax}
s1210629013@55466
    11
\setlength{\tabcolsep}{1ex}
s1210629013@55466
    12
\def\arraystretch{1.20}
s1210629013@55466
    13
\setlength{\tabcolsep}{1ex}
s1210629013@55466
    14
\small
s1210629013@55466
    15
\begin{tabular}{|p{0.33\textwidth}|p{0.36\textwidth}|c|}
s1210629013@55466
    16
  \hline
s1210629013@55466
    17
  \multicolumn{1}{|c}{\em Concept} &
s1210629013@55466
    18
  \multicolumn{1}{|c}{\em Language} &
s1210629013@55466
    19
  \multicolumn{1}{|c|}{\em Section} \\
s1210629013@55466
    20
  \hline\hline
s1210629013@55466
    21
  {\rr Annotations} & {\rr \textit{Clean}} & \ref{sec:annot} \\
s1210629013@55466
    22
  \hline
s1210629013@55466
    23
  {\rr Futures and Promises} & {\rr \textit{Multilisp}, \textit{Isabelle/ML}, \textit{Scala}} & \ref{sec:futurespro}, \ref{sec:actors_scala}, \ref{sec:csp} \\
s1210629013@55466
    24
  \hline
s1210629013@55466
    25
  {\rr Parallel Combinators} & {\rr \textit{Haskell}} & \ref{sec:parcomb} \\
s1210629013@55466
    26
  \hline
s1210629013@55466
    27
  {\rr Algorithmic Skeletons} & {\rr \textit{Java}} & \ref{sec:algoskel} \\
s1210629013@55466
    28
  \hline
s1210629013@55466
    29
  {\rr Data Parallelism} & {\rr \textit{Haskell}} & \ref{sec:data_parallelism} \\
s1210629013@55466
    30
  \hline
s1210629013@55466
    31
  {\rr Soft\-ware Transactional Mem\-o\-ry} & {\rr \textit{C}, \textit{Haskell}, \textit{Java}, \textit{JavaScript}, \textit{Python}, \textit{Scala}} & \ref{sec:stm} \\
s1210629013@55466
    32
  \hline
s1210629013@55466
    33
  {\rr Actors} & {\rr \textit{Erlang}, \textit{Scala}} & \ref{sec:actormodel} \\
s1210629013@55466
    34
  \hline
s1210629013@55466
    35
  {\rr Com\-mu\-ni\-cat\-ing Sequential Pro\-ces\-ses} & {\rr \textit{occam}, \textit{Limbo}, \textit{Go}} & \ref{sec:csp} \\
s1210629013@55466
    36
  \hline
s1210629013@55466
    37
\end{tabular}
s1210629013@55466
    38
\end{table}
s1210629013@55466
    39
For consistency reasons, all sample code is provided in \textit{Standard ML}, even where the demonstrated concepts are not actually available in the language.
s1210629013@55466
    40
%NOTES: IMPLICIT, CONTROLLED, EXPLICIT; TRANSFORMATIONAL VS. REACTIONAL;\\
s1210629013@55466
    41
%MISSING: EVALUATION STRATEGIES (http://hackage.haskell.org/package/parallel-3.1.0.1/docs/Control-Parallel-Strategies.html), ACTIVE DATA STRUCTURES, PIPELINE PARALLELISM
s1210629013@55404
    42
s1210629013@55404
    43
\section{Implicit Parallelism}
s1210629013@55404
    44
\label{sec:implicit_parallelism}
s1210629013@55466
    45
Due to the lack of side effects, it should be possible to compute independent subexpressions in parallel. This means that in theory, purely functional programs are parallelizable automatically by the compiler. However, the whole process is not trivial as section \ref{sec:par_funprog} already pointed out. A compiler that supports implicit parallelization must be able to perform granularity and cost analyses in order to achieve advantageous parallelizations and avoid those whose overhead would exceed their potential gain because they are too fine-grained. Additionally, compilers for lazy languages have to incorporate a strictness analysis, i.e. identify expressions which must necessarily be evaluated, such that they do not introduce an overhead by computing expressions whose results are not needed. An example for a function with a strict argument is the {\tt map} operation. If its list argument is undefined, the result of the whole function call will be undefined.
s1210629013@55466
    46
s1210629013@55466
    47
s1210629013@55466
    48
\section{Task Parallelism}
s1210629013@55466
    49
\label{sec:task_par}
s1210629013@55466
    50
Instead of automating parallelization, functional general-purpose languages such as \textit{Haskell} and \textit{Clean} provide special syntax to indicate that expressions are to be evaluated in parallel and thereby make theoretically implicit parallelism explicit.
s1210629013@55466
    51
s1210629013@55466
    52
\subsection{Parallel {\tt let}-Expressions}
s1210629013@55466
    53
One proposed method are parallel {\tt let}-expressions \cite{hammer2007proposal}. {\tt let}-expressions are a common way in functional languages to allow the programmer to declare a list of values required for the evaluation of one final expression. The modified version {\tt letpar} has been suggested for the indication of a list of independent values which could potentially be computed in parallel. While this keyword is not available in \textit{Standard ML} a hypothetical use could look like the expression
s1210629013@55466
    54
\imlcode{~~\=le\=tpar\\
s1210629013@55466
    55
\>\>val x = add2 3\\
s1210629013@55466
    56
\>\>val y = fib 8\\
s1210629013@55466
    57
\>\>val z = add 7 5\\
s1210629013@55466
    58
\>in x + y + z end; \textrm{.}}
s1210629013@55466
    59
s1210629013@55466
    60
\subsection{Annotations}
s1210629013@55466
    61
\label{sec:annot}
s1210629013@55466
    62
Another straight forward solution is the use of annotations. This is probably the simplest way of expressing parallelism because it does not affect the semantics of a program but only its runtime behavior. By ignoring the annotations the program can then still be compiled sequentially. This method has been implemented in the programming language \textit{Clean} \cite{nocker1991concurrent} which also allows indications of the target processing unit for the evaluation. Since eager languages evaluate a function's arguments before invoking the function itself, this mechanism, in its simple form, only works for lazy languages. A potential implementation of the parallel {\tt fib} function\footnote{see comments in appendix \ref{sec:fib-imp}} in a lazy version of \textit{Standard ML} could look somewhat like the definition
s1210629013@55466
    63
\imlcode{\label{alg:parannot}
neuper@55476
    64
~~\=fu\=n fib 0 = 0\\
s1210629013@55466
    65
\>\>| fib 1 = 1\\
s1210629013@55466
    66
\>\>| fib x = fib (x - 1) + par (fib (x - 2)); \textrm{.}}
s1210629013@55466
    67
\noindent
s1210629013@55466
    68
A similar construct for eager languages is slightly more involved and will be discussed in the following section.
s1210629013@55466
    69
s1210629013@55466
    70
\subsection{Futures and Promises}
s1210629013@55466
    71
\label{sec:futurespro}
s1210629013@55466
    72
The concepts of {\em promises} and {\em futures} were first suggested in the mid 1970s \cite{baker1977incremental}. The idea is similar to annotations (see previous section). One of the first languages that adopted futures was called {\em Multilisp} \cite{halstead1985multilisp}. As the {\em Isabelle} theorem prover provides its own implementation of futures \cite{matthews2010efficient} which have been introduced to \isac{} during the parallelization process (section \ref{ssec:session-manag}), they are of special interest for this thesis. Futures are objects representing the result of an expression which may not be known initially because its computation is managed by the run-time system and may occur in parallel. {\em Implicit} futures are usually an integral part of a language and treated like ordinary references. If results are not available the first time they are requested, program execution stops until the future value has been obtained. In contrast, {\em explicit} futures require programmers to manually enforce their computations before they can access their results. {\em Isabelle}'s solution follows the latter approach and will be discussed in more detail in section \ref{sec:isafut}. The definition
s1210629013@55466
    73
\imlcode{\label{alg:futures}
neuper@55476
    74
~~\=fu\=n \=fi\=b \=0 = 0\\
s1210629013@55466
    75
\>\>| fib 1 = 1\\
s1210629013@55466
    76
\>\>| fib x = let\\
s1210629013@55466
    77
\>\>\>\>\>fun fib' () = fib (x - 2)\\
s1210629013@55466
    78
\>\>\>\>\>val fibf = Future.fork fib'\\
s1210629013@55466
    79
\>\>\>\>in fib (x - 1) + (Future.join fibf) end;}
s1210629013@55466
    80
\noindent
s1210629013@55466
    81
demonstrates the use of futures in \textit{Isabelle/ML}\footnote{see comments in appendix \ref{sec:fib-imp}}. The function \texttt{Future.fork} accepts one argument which must be a function of type \texttt{unit -> 'a}, i.e. one which accepts an empty argument ``\texttt{()}'' and returns an arbitrary type. \texttt{Future.join} requires a future value as its argument and returns the result contained in it.
s1210629013@55466
    82
s1210629013@55466
    83
As with general evaluation strategies, futures can be determined eagerly or lazily, i.e. their computation can be started on creation of the datastructure or on demand. They are read-only placeholders. In contrast, promises are the single assignment containers which hold the future's value. Futures and promises are closely linked to Communicating Sequential Processes (section \ref{sec:csp}) and also to the Actor model (section \ref{sec:actors_scala}) and are therefore available e.g. in \textit{Scala} \cite{akka2014futures}. Although their adoption originated in functional programming, their use is not limited to this paradigm.
s1210629013@55466
    84
%http://en.wikipedia.org/wiki/Task_parallelism, Scala, Isabelle
s1210629013@55466
    85
s1210629013@55466
    86
\subsection{Parallel Combinators}
s1210629013@55466
    87
\label{sec:parcomb}
s1210629013@55466
    88
\textit{Haskell}'s approach to task parallelism is the use of parallel combinators. Its run-time system manages a queue of tasks, so called {\em sparks}, whose evaluation occurs as soon as an idle CPU is detected. The language provides the keyword {\tt par}, which accepts two arguments, ``sparks'' the evaluation of the first and returns the computed result of the second. For more details see e.g. \cite{marlow2009runtime}.
s1210629013@55466
    89
%CAN BE EXTENDED
s1210629013@55466
    90
s1210629013@55466
    91
\subsection{Algorithmic Skeletons}
s1210629013@55466
    92
\label{sec:algoskel}
neuper@55476
    93
The idea of algorithmic skeletons is simple and not limited to parallelism. Most parallel algorithms show common patterns. These patterns can be abstracted into higher-order functions. A well known example is {\em divide-and-conquer}:
s1210629013@55466
    94
\imlcode{~~\=fu\=n \=da\=c \=is\=\_trivial solve divide conquer x =\\
s1210629013@55466
    95
\>\>\>if is\_trivial x then\\
s1210629013@55466
    96
\>\>\>\>solve x\\
s1210629013@55466
    97
\>\>\>else\\
s1210629013@55466
    98
\>\>\>\>divide x\\
s1210629013@55466
    99
\>\>\>\>\>|> map (dac is\_trivial solve divide conquer)\\
s1210629013@55466
   100
\>\>\>\>\>|> conquer; \textrm{,}}
s1210629013@55466
   101
\noindent
neuper@55476
   102
where \texttt{x |> f} is syntactic sugar for \texttt{f x} and \texttt{is\_trivial} is a function with one argument and a \texttt{bool} return value. In order to parallelize the algorithm we only need to use a parallel version of {\tt map}. Other common patterns include {\em branch-and-bound} and {\em dynamic programming}. While this method is also available for imperative languages such as \textit{Java}, the fact that functional languages treat higher-order functions as first-class citizens makes them particularly suitable for the use of skeletons. Algorithmic skeletons hide the orchestration between sequential portions of code from the programmer. Sometimes the compiler can provide low-level support for the skeletons. Furthermore, multiple primitive skeletons can be composed to produce more powerful patterns. Unlike other high-level parallelism mechanisms, they are apt to cost modeling \cite{hammond2000research}.
s1210629013@55404
   103
s1210629013@55404
   104
s1210629013@55404
   105
\section{Data Parallelism}
s1210629013@55404
   106
\label{sec:data_parallelism}
s1210629013@55466
   107
% COULD BE EXTENDED (nested parallelism details, active data structures, Scala parallel collections)
s1210629013@55466
   108
Many operations require the application of the same function on multiple items of a data set. This can generally occur in parallel. With this kind of algorithms, the parallel portion naturally depends on the problem size. Here the same issues apply that we encountered before: The granularity often is too fine and the overhead exceeds the benefits of parallel processing. Most applications of data parallelism are based on array structures and early implementations specialized in highly parallel, high-performance and scientific computing \cite{hammond2000research}. Recent developments in GPU hardware and the arrival of general-purpose GPU programming thanks to projects such as {\em CUDA} and {\em Accelerator} \cite{tarditi2006accelerator} have fueled data parallelism research during the last few years \cite{nickolls2010gpu}. Distributed, parallel algorithms like Google's {\em MapReduce} should be mentioned here. However, we will not discuss distributed parallelism in this thesis.
s1210629013@55404
   109
s1210629013@55466
   110
In 1990, Blelloch and Sabot first made a distinction between {\em flat} and {\em nested} parallelism \cite{blelloch90compiling}. Until then, data parallelism implementations had only considered sequential functions for parallel application on data sets. Nested data parallelism is significantly more complex in that it allows the application of functions which themselves are parallel. More than 15 years later, this functionality was suggested and implemented as an extension to the \textit{Haskell Glasgow Compiler} called {\em Data Parallel Haskell} \cite{peytonjones2008harnessing}.
s1210629013@55404
   111
s1210629013@55404
   112
s1210629013@55404
   113
\section{Concurrent Functional Programming}
s1210629013@55404
   114
\label{sec:concurrentfunprog}
s1210629013@55466
   115
As we saw in section \ref{sec:parallelism}, concurrency describes two or more tasks whose lifespans may overlap while parallelism is capable of exploiting multiple processing units by executing multiple tasks simultaneously. Section \ref{sec:concurresp} outlined how concurrency can be beneficial by hiding latencies and improving responsiveness. Now we want to explore concurrency models and control mechanisms in functional programming.
s1210629013@55466
   116
% could further be extended by process calculi
s1210629013@55404
   117
s1210629013@55404
   118
\subsection{Software Transactional Memory}
s1210629013@55466
   119
\label{sec:stm}
neuper@55476
   120
While hardware support for transactions had been introduced before and they were a fundamental aspect of fault tolerance in the context of database systems, software-only transactional memory was first suggested by Shavit and Touitou in 1997 \cite{shavit1997software}. Unlike {\em lock-based synchronization} STM generally is a non-blocking strategy. Transactions are sequences of read and write operations on a shared memory. They are tagged atomic blocks, processed optimistically, regardless of other threads potentially performing other alterations to the memory simultaneously. Each access to the memory is logged and once a transaction is finished, a final validation step verifies that no other thread modified any of the elements which were read during the process, i.e. the transaction's view of the memory was consistent. If there was no conflicting write access, the transaction is committed to the memory. Otherwise the whole process is repeated until no more conflicts arise. In order to be correct, transactions must be linearizable, i.e. it must be possible to execute them in a certain sequence without temporal overlaps, such that the resulting state of the shared memory is the same as if the transactions had been carried out concurrently \cite{herlihy1990linearizability}. Because of their non-blocking nature, STMs do not cause deadlocks or priority inversion. They are inherently fault tolerant to a certain extent, because transactions can be undone in the event of a timeout or an exception. Also, they are free from the tradeoffs between lock granularity and concurrency.
s1210629013@55466
   121
s1210629013@55466
   122
Recent work on {\em commitment ordering} has helped reduce the number of transaction conflicts and thereby improve performance of STM implementations \cite{ramadan2009committing,zhang2010software}. Also, there are various STM solutions that do in fact use locking mechanisms during different phases of transaction. They can potentially reduce the number of rollbacks and consequently also make process coordination faster \cite{ennals2006software}. The significant difference between these approaches and conventional, mutex based thread synchronization mechanisms is the fact that with STM, any locking is invisible to the programmer and taken care of by the run-time system. The downside of STM is the performance overhead caused by transaction management which typically impairs the performance of programs with a low number of threads, such that they are slower than when implemented in a sequential manner. STMs, however, scale particularly well with respect to higher thread and processor numbers \cite{saha2006mcrt}.
s1210629013@55466
   123
s1210629013@55466
   124
Several implementations for various programming languages including \textit{C}, \textit{Java}, \textit{Scala}, \textit{Python} and even \textit{JavaScript} are available. \textit{Concurrent Haskell} offers multiple language constructs built on the basis of STM to allow for composable, nested, alternative and blocking transactions as well as STM exceptions \cite{harris2005composable}.
s1210629013@55466
   125
%An example use of STM in StandardML could look like snippet \ref{alg:stm}.
s1210629013@55466
   126
%\begin{MLCode}[alg:stm]{Theoretical Software Transactional Memory use in \textit{Standard ML}}
s1210629013@55466
   127
%\end{MLCode}
s1210629013@55466
   128
s1210629013@55466
   129
%CML similar approach par_conc_ml.pdf
s1210629013@55466
   130
%http://channel9.msdn.com/Shows/Going+Deep/Programming-in-the-Age-of-Concurrency-Software-Transactional-Memory
s1210629013@55404
   131
s1210629013@55404
   132
\subsection{The Actor Model}
s1210629013@55466
   133
\label{sec:actormodel}
neuper@55476
   134
In 1973, Hewitt, Bishop and Steiger suggested a modular actor formalism \cite{hewitt1973universal} which laid out the basis of what is now known as the {\em Actor model}. Several publications of the following decade formed a comprehensive Actor model theory. The two central concepts of the Actor model are {\em actors} and {\em messages} \cite{agha1985actors}. Actors are entities that can send and receive messages. Their behavior is defined by the {\bf actions} they perform on receipt of a message (e.g. create another actor) and a finite number of other actors they know about, called {\bf acquaintances}. The acquaintance relation is not necessarily bidirectional, i.e. an acquaintance $B$ of actor $A$ must not necessarily know actor $A$. Each actor has an immutable, unique name and a local state. Communication between actors is asynchronous. Incoming messages are processed one at a time and as a response to each message an atomic sequence of actions is performed. Since actors and messages are autonomous, encapsulated entities they are easy to reason about and can be composed into more complex systems \cite{agha1997foundation}. The order of actions affects an actor's external behavior. But because messaging occurs asynchronously, the processing order of messages is undefined. This leads to non-determinism in computations based on the actor system. There are certain abstractions available which allow for constraints on message ordering and can therefore eliminate part of the non-determinism \cite{karmani2009actor}. They can be summarized in four important properties:
s1210629013@55466
   135
\begin{enumerate}
s1210629013@55466
   136
\item {\bf Encapsulation.} Unlike shared memory thread models, actors are highly encapsulated. Therefore, they provide no locking or synchronization mechanisms for shared resources. Their internal states are not directly accessible by other actors. The only way for them to exchange data or influence each other is by means of messages. This limitation, however, is violated by some implementations in languages such as \textit{Scala} \cite{thurauakka}. In programming languages which permit mutable data, messages should, in theory, not transfer references to locations in a shared address space. Otherwise the receiving party could modify memory locations that are contained in another actor's state representation. However, \textit{Java}'s actor library {\em Akka} allows actors to do just that and it is the programmer's responsibility to maintain encapsulation.
s1210629013@55466
   137
\item {\bf Fairness.} Another important property is fairness with respect to actor sched\-ul\-ing: a message will be delivered to its target actor eventually, i.e. unless an actor dies or displays faulty behavior, it will be scheduled in a fair manner.
s1210629013@55466
   138
\item {\bf Location Transparency.} This property demands that an actor's unique name is independent of its location, i.e. its assigned CPU and memory can potentially belong to a remote machine and even be migrated at run time.
s1210629013@55466
   139
\item {\bf Mobility.} The ability for this migration can enable load balancing and improved fault tolerance. However, implementations do not necessarily satisfy these properties (e.g. \textit{Scala}). {\em Weak mobility} allows code or initial states to be moved, while {\em strong mobility} additionally supports execution state migration \cite{fuggetta1998understanding}. It has been shown that the presence of mobility can significantly improve an architecture's scalability \cite{panwar1994methodology}.
s1210629013@55466
   140
\end{enumerate}
s1210629013@55466
   141
To demonstrate the use of a theoretical actor library for \textit{Standard ML}, let us define a trivial function \texttt{greet} and execute it in the background:
s1210629013@55466
   142
\imlcode{~~\=fun greet name = print ("Hello " \^~name \^~"!\textbackslash{}n");\\
s1210629013@55466
   143
\>val greeter = Actor.create greet;\\
s1210629013@55466
   144
\>Actor.send greeter "John Doe";}
s1210629013@55404
   145
s1210629013@55466
   146
\subsubsection{\textit{Scala}}
s1210629013@55466
   147
\label{sec:actors_scala}
neuper@55476
   148
\textit{Scala} is a programming language for the \textit{Java Virtual Machine} that combines concepts of object-oriented and functional programming \cite{odersky2004overview}. Its distribution comes with a toolkit for actors and message passing functionality called {\em Akka}. \textit{Scala}'s standard actors library up to version 2.9.2 \cite{scala2014migration} has now been deprecated and replaced by \textit{Akka}. Its implementation is based on \textit{Erlang} like, fault tolerant actors and its execution model follows an {\em event-based} approach, thus introducing {\em inversion of control}. Compared to \textit{Scala}'s original actors, \textit{Akka} actors are more performant, fault tolerant and offer automatic load balancing \cite{tasharofi2013scala}. They draw on the language's functional programming features such as partial functions and, like \textit{Erlang}, pattern matching which are convenient for message passing. Actors potentially provide safer concurrency than traditional shared memory mechanisms because by design, no race conditions can occur. Also, actors are more lightweight than threads.
s1210629013@55466
   149
% why no race conditions!?!?!?
s1210629013@55404
   150
s1210629013@55466
   151
\subsubsection{Futures}
s1210629013@55466
   152
As mentioned in section \ref{sec:futurespro}, futures are related to the Actor model. \textit{Akka} provides a {\tt Future} class whose constructor accepts a block ({\tt \{...\}}) $b$ and returns immediately with a future value which is a placeholder for the computation's result. $b$ is then executed asynchronously by a thread pool managed by the run-time environment \cite{akka2014futures}. Instead of creating futures directly, \textit{Akka}'s actors have a method {\tt ?} that one can use to send a message. The method then returns a future value whose calculation can be synchronized and retrieved using {\tt Await.result}. There are further functions for composition, combination or ordering of futures as well as exception and error handling. The library also supports direct access to {\tt Promises}, the containers that complete futures.
s1210629013@55404
   153
s1210629013@55466
   154
\subsection{Communicating Sequential Processes}
s1210629013@55466
   155
\label{sec:csp}
s1210629013@55466
   156
CSP (Communicating Sequential Processes) belong to the family of {\em process calculi} and are another model of concurrency. Process calculi are of special interest for functional programming in that they support a declarative notion of concurrency. CSP were first proposed in an acclaimed paper by Hoare in 1978 \cite{hoare1978communicating} and originally designed for reasoning about concurrent processes. Like the Actor model they utilize message passing for communication. Implementations include {\em occam} \cite{INMOS84occam}, {\em Limbo} \cite{dorward1997programming} and {\em Go} by Google \cite{pike2012go}. The main difference from the Actor model is the use of a concept named {\em channels} in CSP. Actors communicate directly with each other while CSP send and receive messages on directed channels. Another distinction is the fact that in the original CSP model, messages are delivered synchronously and instantaneously. This means that when a process sends a message to a channel, it waits until another process has received it and the receiving party blocks until there is a message available on the channel. Processes are anonymous and need to establish channels between each other in order to be able to communicate which involves a rendezvous, i.e. they need to exchange receiving and sending ends of channels to allow them to communicate. While the basic proposal suggested point-to-point channels, this can also be extended to allow multiple processes to know and use a channel's sending and/or receiving end. The developers of {\em Go} have chosen an approach that differs slightly from the original CSP model: Channels are asynchronous and buffered. They are declared to transmit a specific da\-ta\-type. Interestingly, channels are first class values. As a consequence, they can also be transmitted via channels. Using \textit{Poly/ML}'s thread library, we can utilize a supposed CSP library to achieve the same behavior we already saw with actors and invoke \texttt{greet} asynchronously:
s1210629013@55466
   157
\imlcode{\label{alg:csp}
s1210629013@55466
   158
~~\=fun greet name = print ("Hello " \^~name \^~"!\textbackslash{}n");\\
s1210629013@55466
   159
\>val (c\_in, c\_out) = Channel.create () : string channel;\\
s1210629013@55466
   160
\>fun greet\_csp () = Channel.read c\_out |> greet;\\
s1210629013@55466
   161
\>Thread.fork (greet\_csp, []);\\
s1210629013@55466
   162
\>Channel.write c\_in "John Doe";}
s1210629013@55404
   163
s1210629013@55466
   164
\subsubsection{Futures}
s1210629013@55466
   165
CSP, too, can simulate futures (section \ref{sec:futurespro}): A future is a channel that can transmit exactly one element and the according promise is another process that sends the result of a computation to the channel.
s1210629013@55404
   166
s1210629013@55466
   167
% "Squeak: a Language for Communicating with Mice"
s1210629013@55466
   168
% "Newsqueak: a Language for Communicating with Mice"
s1210629013@55466
   169
% Go concurrency: http://www.youtube.com/watch?v=3DtUzH3zoFo
s1210629013@55466
   170
%   slides: http://go-lang.cat-v.org/talks/slides/emerging-languages-camp-2010.pdf
s1210629013@55466
   171
s1210629013@55466
   172
% CCS? pi-calculus?
s1210629013@55466
   173
% Clean, Broadcasting Sequential Processes
s1210629013@55466
   174
% about 1 page
s1210629013@55466
   175
s1210629013@55466
   176
s1210629013@55466
   177
\section{Refactoring for Parallelism}
s1210629013@55404
   178
\label{sec:refac_funprogs4mcsystems}
s1210629013@55466
   179
Only few publications have been dedicated to refactorings for parallelism \cite{kennedy1991interactive,wloka2009refactoring,dig2011refactoring}, even less so in the context of functional programming. The {\em ParaForming} tool for \textit{Haskell} \cite{brown2012paraforming} appears to be the first attempt, followed by very recent work on \textit{Erlang} \cite{brown2013cost} and language independent approaches \cite{hammond2013paraphrase,brown2012language}. Due to the trend towards multi-, many- and even megacore machines, modern programming languages should support parallelism by design because the introduction of parallelism should not occur as an afterthought during later stages of the software development cycle. \textit{ParaForming} enforces a separation of concerns between business logic and implementation details usually taken care of by system programmers. It offers several additional refactorings for \textit{HaRe} (section \ref{sec:refacfunprogs}) to support efficient, coarse-grained task and data parallelism.
s1210629013@55466
   180
\begin{description}
s1210629013@55466
   181
\item[Introduce Task Parallelism.] This refactoring takes an expression $x$ from a function, embeds $x$ into an {\tt Eval} monad, sparks its evaluation using parallel combinators (section \ref{sec:parcomb}) and substitutes the occurrences of $x$ for the sparked computation $x'$.
s1210629013@55466
   182
\item[Introduce Data Parallelism] takes an expression $l$ which must be a list, replaces it by a {\tt parList}, i.e. a parallel implementation of a list supported by the standard library, and ensures the parallel versions of operations on $l$.
s1210629013@55466
   183
\item[Introduce Thresholding] allows for the definition of limits to the granularity of the aforementioned refactorings. Thresholding requires a designated expression $t$ and a threshold. Calls to the respective function are only evaluated in parallel if $t$ is not lower than the threshold.
s1210629013@55466
   184
\item[Clustering] refers to data parallelism and accepts a minimum chunk size for a parallel list. Whenever the size of a list falls below this value, the sequential versions of operations are used.
s1210629013@55466
   185
\end{description}
s1210629013@55466
   186
\noindent
s1210629013@55466
   187
For other refactorings, examples and a discussion of performance gains and overheads please refer to \cite{brown2012paraforming}. \textit{ParaForming} is very specific to \textit{Haskell} and its refactorings are relatively straight forward. However, they avoid common pitfalls like forgetting to substitute the use of an expression. Decisions on granularity are not made by the system, but their implementation is automated as much as possible.
s1210629013@55404
   188
s1210629013@55466
   189
\subsubsection{Wrangler}
s1210629013@55466
   190
In 2013, Brown et al. documented their work on an extension for a refactoring tool for Erlang, named {\em Wrangler} \cite{brown2013cost,li2006comparative}. The provided refactorings make use of skeletons (section \ref{sec:algoskel}) and thereby facilitate informed decisions on which refactorings and skeletons to use based on cost modeling.
s1210629013@55404
   191
s1210629013@55466
   192
% paraforming
s1210629013@55466
   193
% based on rewrite rules applied to abstract syntax tree (AST)
s1210629013@55466
   194
% refactoring is function taking a list possible rewrite rules
s1210629013@55466
   195
% rewrite rules have conditions for nodes which they can be applied to
s1210629013@55404
   196
s1210629013@55466
   197
% futurework: standard ml implementation
s1210629013@55466
   198
% "A parallel SML compiler based on algorithmic skeletons"