doc-isac/mlehnfeld/master/thesis/funproglangs_mcsystems.tex
author Walther Neuper <neuper@ist.tugraz.at>
Wed, 23 Jul 2014 14:32:19 +0200
changeset 55476 8e3f73e1e3a3
parent 55466 55c2d2ee3f92
child 60710 21ae85b023bb
permissions -rw-r--r--
final docu of mlehnfeld

copy from https://hg.risc.uni-linz.ac.at/wneuper/mlehnfeld
changeset 97ab9b31c84d: Added tag fixed final for changeset 2faac24320f2
author Mathias Lehnfeld <s1210629013@students.fh-hagenberg.at>
Sun, 20 Jul 2014 16:32:03 +0200 (2 days ago)
changeset 38 97ab9b31c84d
     1 \chapter{Parallelism in Functional Programming}
     2 \label{cha:funproglangs_mcsystems}
     3 This section presents several concepts for parallelism and concurrency which have been suggested for and implemented in functional programming languages. The central idea here is the philosophy of the declarative paradigm: The developer should not have to worry about {\em how} the parallel execution needs to be organized but at most make informed decisions on {\em what} has to be parallelized. Ideally, process management details such as process creation, resource assignment and synchronization should be taken care of by compiler and run-time system. In practice, various properties of languages such as impureness and monadic I/O, may allow for language constructs from imperative programming or even require forms of explicit synchronization.
     4 
     5 Many of the mechanisms presented in the following sections have been implemented in and modified for certain programming languages. It seemed appropriate to place details on these realization details and changes in the respective sections. Table \ref{tab:lang_par_overview} gives a brief overview. Note that it only contains those mechanisms and sections that mention concrete implementations.
     6 \begin{table}[ht]
     7 \centering
     8 \caption{Overview of parallelism mechanisms and referenced programming languages.}
     9 \label{tab:lang_par_overview}
    10 \def\rr{\rightskip=0pt plus1em \spaceskip=.3333em \xspaceskip=.5em\relax}
    11 \setlength{\tabcolsep}{1ex}
    12 \def\arraystretch{1.20}
    13 \setlength{\tabcolsep}{1ex}
    14 \small
    15 \begin{tabular}{|p{0.33\textwidth}|p{0.36\textwidth}|c|}
    16   \hline
    17   \multicolumn{1}{|c}{\em Concept} &
    18   \multicolumn{1}{|c}{\em Language} &
    19   \multicolumn{1}{|c|}{\em Section} \\
    20   \hline\hline
    21   {\rr Annotations} & {\rr \textit{Clean}} & \ref{sec:annot} \\
    22   \hline
    23   {\rr Futures and Promises} & {\rr \textit{Multilisp}, \textit{Isabelle/ML}, \textit{Scala}} & \ref{sec:futurespro}, \ref{sec:actors_scala}, \ref{sec:csp} \\
    24   \hline
    25   {\rr Parallel Combinators} & {\rr \textit{Haskell}} & \ref{sec:parcomb} \\
    26   \hline
    27   {\rr Algorithmic Skeletons} & {\rr \textit{Java}} & \ref{sec:algoskel} \\
    28   \hline
    29   {\rr Data Parallelism} & {\rr \textit{Haskell}} & \ref{sec:data_parallelism} \\
    30   \hline
    31   {\rr Soft\-ware Transactional Mem\-o\-ry} & {\rr \textit{C}, \textit{Haskell}, \textit{Java}, \textit{JavaScript}, \textit{Python}, \textit{Scala}} & \ref{sec:stm} \\
    32   \hline
    33   {\rr Actors} & {\rr \textit{Erlang}, \textit{Scala}} & \ref{sec:actormodel} \\
    34   \hline
    35   {\rr Com\-mu\-ni\-cat\-ing Sequential Pro\-ces\-ses} & {\rr \textit{occam}, \textit{Limbo}, \textit{Go}} & \ref{sec:csp} \\
    36   \hline
    37 \end{tabular}
    38 \end{table}
    39 For consistency reasons, all sample code is provided in \textit{Standard ML}, even where the demonstrated concepts are not actually available in the language.
    40 %NOTES: IMPLICIT, CONTROLLED, EXPLICIT; TRANSFORMATIONAL VS. REACTIONAL;\\
    41 %MISSING: EVALUATION STRATEGIES (http://hackage.haskell.org/package/parallel-3.1.0.1/docs/Control-Parallel-Strategies.html), ACTIVE DATA STRUCTURES, PIPELINE PARALLELISM
    42 
    43 \section{Implicit Parallelism}
    44 \label{sec:implicit_parallelism}
    45 Due to the lack of side effects, it should be possible to compute independent subexpressions in parallel. This means that in theory, purely functional programs are parallelizable automatically by the compiler. However, the whole process is not trivial as section \ref{sec:par_funprog} already pointed out. A compiler that supports implicit parallelization must be able to perform granularity and cost analyses in order to achieve advantageous parallelizations and avoid those whose overhead would exceed their potential gain because they are too fine-grained. Additionally, compilers for lazy languages have to incorporate a strictness analysis, i.e. identify expressions which must necessarily be evaluated, such that they do not introduce an overhead by computing expressions whose results are not needed. An example for a function with a strict argument is the {\tt map} operation. If its list argument is undefined, the result of the whole function call will be undefined.
    46 
    47 
    48 \section{Task Parallelism}
    49 \label{sec:task_par}
    50 Instead of automating parallelization, functional general-purpose languages such as \textit{Haskell} and \textit{Clean} provide special syntax to indicate that expressions are to be evaluated in parallel and thereby make theoretically implicit parallelism explicit.
    51 
    52 \subsection{Parallel {\tt let}-Expressions}
    53 One proposed method are parallel {\tt let}-expressions \cite{hammer2007proposal}. {\tt let}-expressions are a common way in functional languages to allow the programmer to declare a list of values required for the evaluation of one final expression. The modified version {\tt letpar} has been suggested for the indication of a list of independent values which could potentially be computed in parallel. While this keyword is not available in \textit{Standard ML} a hypothetical use could look like the expression
    54 \imlcode{~~\=le\=tpar\\
    55 \>\>val x = add2 3\\
    56 \>\>val y = fib 8\\
    57 \>\>val z = add 7 5\\
    58 \>in x + y + z end; \textrm{.}}
    59 
    60 \subsection{Annotations}
    61 \label{sec:annot}
    62 Another straight forward solution is the use of annotations. This is probably the simplest way of expressing parallelism because it does not affect the semantics of a program but only its runtime behavior. By ignoring the annotations the program can then still be compiled sequentially. This method has been implemented in the programming language \textit{Clean} \cite{nocker1991concurrent} which also allows indications of the target processing unit for the evaluation. Since eager languages evaluate a function's arguments before invoking the function itself, this mechanism, in its simple form, only works for lazy languages. A potential implementation of the parallel {\tt fib} function\footnote{see comments in appendix \ref{sec:fib-imp}} in a lazy version of \textit{Standard ML} could look somewhat like the definition
    63 \imlcode{\label{alg:parannot}
    64 ~~\=fu\=n fib 0 = 0\\
    65 \>\>| fib 1 = 1\\
    66 \>\>| fib x = fib (x - 1) + par (fib (x - 2)); \textrm{.}}
    67 \noindent
    68 A similar construct for eager languages is slightly more involved and will be discussed in the following section.
    69 
    70 \subsection{Futures and Promises}
    71 \label{sec:futurespro}
    72 The concepts of {\em promises} and {\em futures} were first suggested in the mid 1970s \cite{baker1977incremental}. The idea is similar to annotations (see previous section). One of the first languages that adopted futures was called {\em Multilisp} \cite{halstead1985multilisp}. As the {\em Isabelle} theorem prover provides its own implementation of futures \cite{matthews2010efficient} which have been introduced to \isac{} during the parallelization process (section \ref{ssec:session-manag}), they are of special interest for this thesis. Futures are objects representing the result of an expression which may not be known initially because its computation is managed by the run-time system and may occur in parallel. {\em Implicit} futures are usually an integral part of a language and treated like ordinary references. If results are not available the first time they are requested, program execution stops until the future value has been obtained. In contrast, {\em explicit} futures require programmers to manually enforce their computations before they can access their results. {\em Isabelle}'s solution follows the latter approach and will be discussed in more detail in section \ref{sec:isafut}. The definition
    73 \imlcode{\label{alg:futures}
    74 ~~\=fu\=n \=fi\=b \=0 = 0\\
    75 \>\>| fib 1 = 1\\
    76 \>\>| fib x = let\\
    77 \>\>\>\>\>fun fib' () = fib (x - 2)\\
    78 \>\>\>\>\>val fibf = Future.fork fib'\\
    79 \>\>\>\>in fib (x - 1) + (Future.join fibf) end;}
    80 \noindent
    81 demonstrates the use of futures in \textit{Isabelle/ML}\footnote{see comments in appendix \ref{sec:fib-imp}}. The function \texttt{Future.fork} accepts one argument which must be a function of type \texttt{unit -> 'a}, i.e. one which accepts an empty argument ``\texttt{()}'' and returns an arbitrary type. \texttt{Future.join} requires a future value as its argument and returns the result contained in it.
    82 
    83 As with general evaluation strategies, futures can be determined eagerly or lazily, i.e. their computation can be started on creation of the datastructure or on demand. They are read-only placeholders. In contrast, promises are the single assignment containers which hold the future's value. Futures and promises are closely linked to Communicating Sequential Processes (section \ref{sec:csp}) and also to the Actor model (section \ref{sec:actors_scala}) and are therefore available e.g. in \textit{Scala} \cite{akka2014futures}. Although their adoption originated in functional programming, their use is not limited to this paradigm.
    84 %http://en.wikipedia.org/wiki/Task_parallelism, Scala, Isabelle
    85 
    86 \subsection{Parallel Combinators}
    87 \label{sec:parcomb}
    88 \textit{Haskell}'s approach to task parallelism is the use of parallel combinators. Its run-time system manages a queue of tasks, so called {\em sparks}, whose evaluation occurs as soon as an idle CPU is detected. The language provides the keyword {\tt par}, which accepts two arguments, ``sparks'' the evaluation of the first and returns the computed result of the second. For more details see e.g. \cite{marlow2009runtime}.
    89 %CAN BE EXTENDED
    90 
    91 \subsection{Algorithmic Skeletons}
    92 \label{sec:algoskel}
    93 The idea of algorithmic skeletons is simple and not limited to parallelism. Most parallel algorithms show common patterns. These patterns can be abstracted into higher-order functions. A well known example is {\em divide-and-conquer}:
    94 \imlcode{~~\=fu\=n \=da\=c \=is\=\_trivial solve divide conquer x =\\
    95 \>\>\>if is\_trivial x then\\
    96 \>\>\>\>solve x\\
    97 \>\>\>else\\
    98 \>\>\>\>divide x\\
    99 \>\>\>\>\>|> map (dac is\_trivial solve divide conquer)\\
   100 \>\>\>\>\>|> conquer; \textrm{,}}
   101 \noindent
   102 where \texttt{x |> f} is syntactic sugar for \texttt{f x} and \texttt{is\_trivial} is a function with one argument and a \texttt{bool} return value. In order to parallelize the algorithm we only need to use a parallel version of {\tt map}. Other common patterns include {\em branch-and-bound} and {\em dynamic programming}. While this method is also available for imperative languages such as \textit{Java}, the fact that functional languages treat higher-order functions as first-class citizens makes them particularly suitable for the use of skeletons. Algorithmic skeletons hide the orchestration between sequential portions of code from the programmer. Sometimes the compiler can provide low-level support for the skeletons. Furthermore, multiple primitive skeletons can be composed to produce more powerful patterns. Unlike other high-level parallelism mechanisms, they are apt to cost modeling \cite{hammond2000research}.
   103 
   104 
   105 \section{Data Parallelism}
   106 \label{sec:data_parallelism}
   107 % COULD BE EXTENDED (nested parallelism details, active data structures, Scala parallel collections)
   108 Many operations require the application of the same function on multiple items of a data set. This can generally occur in parallel. With this kind of algorithms, the parallel portion naturally depends on the problem size. Here the same issues apply that we encountered before: The granularity often is too fine and the overhead exceeds the benefits of parallel processing. Most applications of data parallelism are based on array structures and early implementations specialized in highly parallel, high-performance and scientific computing \cite{hammond2000research}. Recent developments in GPU hardware and the arrival of general-purpose GPU programming thanks to projects such as {\em CUDA} and {\em Accelerator} \cite{tarditi2006accelerator} have fueled data parallelism research during the last few years \cite{nickolls2010gpu}. Distributed, parallel algorithms like Google's {\em MapReduce} should be mentioned here. However, we will not discuss distributed parallelism in this thesis.
   109 
   110 In 1990, Blelloch and Sabot first made a distinction between {\em flat} and {\em nested} parallelism \cite{blelloch90compiling}. Until then, data parallelism implementations had only considered sequential functions for parallel application on data sets. Nested data parallelism is significantly more complex in that it allows the application of functions which themselves are parallel. More than 15 years later, this functionality was suggested and implemented as an extension to the \textit{Haskell Glasgow Compiler} called {\em Data Parallel Haskell} \cite{peytonjones2008harnessing}.
   111 
   112 
   113 \section{Concurrent Functional Programming}
   114 \label{sec:concurrentfunprog}
   115 As we saw in section \ref{sec:parallelism}, concurrency describes two or more tasks whose lifespans may overlap while parallelism is capable of exploiting multiple processing units by executing multiple tasks simultaneously. Section \ref{sec:concurresp} outlined how concurrency can be beneficial by hiding latencies and improving responsiveness. Now we want to explore concurrency models and control mechanisms in functional programming.
   116 % could further be extended by process calculi
   117 
   118 \subsection{Software Transactional Memory}
   119 \label{sec:stm}
   120 While hardware support for transactions had been introduced before and they were a fundamental aspect of fault tolerance in the context of database systems, software-only transactional memory was first suggested by Shavit and Touitou in 1997 \cite{shavit1997software}. Unlike {\em lock-based synchronization} STM generally is a non-blocking strategy. Transactions are sequences of read and write operations on a shared memory. They are tagged atomic blocks, processed optimistically, regardless of other threads potentially performing other alterations to the memory simultaneously. Each access to the memory is logged and once a transaction is finished, a final validation step verifies that no other thread modified any of the elements which were read during the process, i.e. the transaction's view of the memory was consistent. If there was no conflicting write access, the transaction is committed to the memory. Otherwise the whole process is repeated until no more conflicts arise. In order to be correct, transactions must be linearizable, i.e. it must be possible to execute them in a certain sequence without temporal overlaps, such that the resulting state of the shared memory is the same as if the transactions had been carried out concurrently \cite{herlihy1990linearizability}. Because of their non-blocking nature, STMs do not cause deadlocks or priority inversion. They are inherently fault tolerant to a certain extent, because transactions can be undone in the event of a timeout or an exception. Also, they are free from the tradeoffs between lock granularity and concurrency.
   121 
   122 Recent work on {\em commitment ordering} has helped reduce the number of transaction conflicts and thereby improve performance of STM implementations \cite{ramadan2009committing,zhang2010software}. Also, there are various STM solutions that do in fact use locking mechanisms during different phases of transaction. They can potentially reduce the number of rollbacks and consequently also make process coordination faster \cite{ennals2006software}. The significant difference between these approaches and conventional, mutex based thread synchronization mechanisms is the fact that with STM, any locking is invisible to the programmer and taken care of by the run-time system. The downside of STM is the performance overhead caused by transaction management which typically impairs the performance of programs with a low number of threads, such that they are slower than when implemented in a sequential manner. STMs, however, scale particularly well with respect to higher thread and processor numbers \cite{saha2006mcrt}.
   123 
   124 Several implementations for various programming languages including \textit{C}, \textit{Java}, \textit{Scala}, \textit{Python} and even \textit{JavaScript} are available. \textit{Concurrent Haskell} offers multiple language constructs built on the basis of STM to allow for composable, nested, alternative and blocking transactions as well as STM exceptions \cite{harris2005composable}.
   125 %An example use of STM in StandardML could look like snippet \ref{alg:stm}.
   126 %\begin{MLCode}[alg:stm]{Theoretical Software Transactional Memory use in \textit{Standard ML}}
   127 %\end{MLCode}
   128 
   129 %CML similar approach par_conc_ml.pdf
   130 %http://channel9.msdn.com/Shows/Going+Deep/Programming-in-the-Age-of-Concurrency-Software-Transactional-Memory
   131 
   132 \subsection{The Actor Model}
   133 \label{sec:actormodel}
   134 In 1973, Hewitt, Bishop and Steiger suggested a modular actor formalism \cite{hewitt1973universal} which laid out the basis of what is now known as the {\em Actor model}. Several publications of the following decade formed a comprehensive Actor model theory. The two central concepts of the Actor model are {\em actors} and {\em messages} \cite{agha1985actors}. Actors are entities that can send and receive messages. Their behavior is defined by the {\bf actions} they perform on receipt of a message (e.g. create another actor) and a finite number of other actors they know about, called {\bf acquaintances}. The acquaintance relation is not necessarily bidirectional, i.e. an acquaintance $B$ of actor $A$ must not necessarily know actor $A$. Each actor has an immutable, unique name and a local state. Communication between actors is asynchronous. Incoming messages are processed one at a time and as a response to each message an atomic sequence of actions is performed. Since actors and messages are autonomous, encapsulated entities they are easy to reason about and can be composed into more complex systems \cite{agha1997foundation}. The order of actions affects an actor's external behavior. But because messaging occurs asynchronously, the processing order of messages is undefined. This leads to non-determinism in computations based on the actor system. There are certain abstractions available which allow for constraints on message ordering and can therefore eliminate part of the non-determinism \cite{karmani2009actor}. They can be summarized in four important properties:
   135 \begin{enumerate}
   136 \item {\bf Encapsulation.} Unlike shared memory thread models, actors are highly encapsulated. Therefore, they provide no locking or synchronization mechanisms for shared resources. Their internal states are not directly accessible by other actors. The only way for them to exchange data or influence each other is by means of messages. This limitation, however, is violated by some implementations in languages such as \textit{Scala} \cite{thurauakka}. In programming languages which permit mutable data, messages should, in theory, not transfer references to locations in a shared address space. Otherwise the receiving party could modify memory locations that are contained in another actor's state representation. However, \textit{Java}'s actor library {\em Akka} allows actors to do just that and it is the programmer's responsibility to maintain encapsulation.
   137 \item {\bf Fairness.} Another important property is fairness with respect to actor sched\-ul\-ing: a message will be delivered to its target actor eventually, i.e. unless an actor dies or displays faulty behavior, it will be scheduled in a fair manner.
   138 \item {\bf Location Transparency.} This property demands that an actor's unique name is independent of its location, i.e. its assigned CPU and memory can potentially belong to a remote machine and even be migrated at run time.
   139 \item {\bf Mobility.} The ability for this migration can enable load balancing and improved fault tolerance. However, implementations do not necessarily satisfy these properties (e.g. \textit{Scala}). {\em Weak mobility} allows code or initial states to be moved, while {\em strong mobility} additionally supports execution state migration \cite{fuggetta1998understanding}. It has been shown that the presence of mobility can significantly improve an architecture's scalability \cite{panwar1994methodology}.
   140 \end{enumerate}
   141 To demonstrate the use of a theoretical actor library for \textit{Standard ML}, let us define a trivial function \texttt{greet} and execute it in the background:
   142 \imlcode{~~\=fun greet name = print ("Hello " \^~name \^~"!\textbackslash{}n");\\
   143 \>val greeter = Actor.create greet;\\
   144 \>Actor.send greeter "John Doe";}
   145 
   146 \subsubsection{\textit{Scala}}
   147 \label{sec:actors_scala}
   148 \textit{Scala} is a programming language for the \textit{Java Virtual Machine} that combines concepts of object-oriented and functional programming \cite{odersky2004overview}. Its distribution comes with a toolkit for actors and message passing functionality called {\em Akka}. \textit{Scala}'s standard actors library up to version 2.9.2 \cite{scala2014migration} has now been deprecated and replaced by \textit{Akka}. Its implementation is based on \textit{Erlang} like, fault tolerant actors and its execution model follows an {\em event-based} approach, thus introducing {\em inversion of control}. Compared to \textit{Scala}'s original actors, \textit{Akka} actors are more performant, fault tolerant and offer automatic load balancing \cite{tasharofi2013scala}. They draw on the language's functional programming features such as partial functions and, like \textit{Erlang}, pattern matching which are convenient for message passing. Actors potentially provide safer concurrency than traditional shared memory mechanisms because by design, no race conditions can occur. Also, actors are more lightweight than threads.
   149 % why no race conditions!?!?!?
   150 
   151 \subsubsection{Futures}
   152 As mentioned in section \ref{sec:futurespro}, futures are related to the Actor model. \textit{Akka} provides a {\tt Future} class whose constructor accepts a block ({\tt \{...\}}) $b$ and returns immediately with a future value which is a placeholder for the computation's result. $b$ is then executed asynchronously by a thread pool managed by the run-time environment \cite{akka2014futures}. Instead of creating futures directly, \textit{Akka}'s actors have a method {\tt ?} that one can use to send a message. The method then returns a future value whose calculation can be synchronized and retrieved using {\tt Await.result}. There are further functions for composition, combination or ordering of futures as well as exception and error handling. The library also supports direct access to {\tt Promises}, the containers that complete futures.
   153 
   154 \subsection{Communicating Sequential Processes}
   155 \label{sec:csp}
   156 CSP (Communicating Sequential Processes) belong to the family of {\em process calculi} and are another model of concurrency. Process calculi are of special interest for functional programming in that they support a declarative notion of concurrency. CSP were first proposed in an acclaimed paper by Hoare in 1978 \cite{hoare1978communicating} and originally designed for reasoning about concurrent processes. Like the Actor model they utilize message passing for communication. Implementations include {\em occam} \cite{INMOS84occam}, {\em Limbo} \cite{dorward1997programming} and {\em Go} by Google \cite{pike2012go}. The main difference from the Actor model is the use of a concept named {\em channels} in CSP. Actors communicate directly with each other while CSP send and receive messages on directed channels. Another distinction is the fact that in the original CSP model, messages are delivered synchronously and instantaneously. This means that when a process sends a message to a channel, it waits until another process has received it and the receiving party blocks until there is a message available on the channel. Processes are anonymous and need to establish channels between each other in order to be able to communicate which involves a rendezvous, i.e. they need to exchange receiving and sending ends of channels to allow them to communicate. While the basic proposal suggested point-to-point channels, this can also be extended to allow multiple processes to know and use a channel's sending and/or receiving end. The developers of {\em Go} have chosen an approach that differs slightly from the original CSP model: Channels are asynchronous and buffered. They are declared to transmit a specific da\-ta\-type. Interestingly, channels are first class values. As a consequence, they can also be transmitted via channels. Using \textit{Poly/ML}'s thread library, we can utilize a supposed CSP library to achieve the same behavior we already saw with actors and invoke \texttt{greet} asynchronously:
   157 \imlcode{\label{alg:csp}
   158 ~~\=fun greet name = print ("Hello " \^~name \^~"!\textbackslash{}n");\\
   159 \>val (c\_in, c\_out) = Channel.create () : string channel;\\
   160 \>fun greet\_csp () = Channel.read c\_out |> greet;\\
   161 \>Thread.fork (greet\_csp, []);\\
   162 \>Channel.write c\_in "John Doe";}
   163 
   164 \subsubsection{Futures}
   165 CSP, too, can simulate futures (section \ref{sec:futurespro}): A future is a channel that can transmit exactly one element and the according promise is another process that sends the result of a computation to the channel.
   166 
   167 % "Squeak: a Language for Communicating with Mice"
   168 % "Newsqueak: a Language for Communicating with Mice"
   169 % Go concurrency: http://www.youtube.com/watch?v=3DtUzH3zoFo
   170 %   slides: http://go-lang.cat-v.org/talks/slides/emerging-languages-camp-2010.pdf
   171 
   172 % CCS? pi-calculus?
   173 % Clean, Broadcasting Sequential Processes
   174 % about 1 page
   175 
   176 
   177 \section{Refactoring for Parallelism}
   178 \label{sec:refac_funprogs4mcsystems}
   179 Only few publications have been dedicated to refactorings for parallelism \cite{kennedy1991interactive,wloka2009refactoring,dig2011refactoring}, even less so in the context of functional programming. The {\em ParaForming} tool for \textit{Haskell} \cite{brown2012paraforming} appears to be the first attempt, followed by very recent work on \textit{Erlang} \cite{brown2013cost} and language independent approaches \cite{hammond2013paraphrase,brown2012language}. Due to the trend towards multi-, many- and even megacore machines, modern programming languages should support parallelism by design because the introduction of parallelism should not occur as an afterthought during later stages of the software development cycle. \textit{ParaForming} enforces a separation of concerns between business logic and implementation details usually taken care of by system programmers. It offers several additional refactorings for \textit{HaRe} (section \ref{sec:refacfunprogs}) to support efficient, coarse-grained task and data parallelism.
   180 \begin{description}
   181 \item[Introduce Task Parallelism.] This refactoring takes an expression $x$ from a function, embeds $x$ into an {\tt Eval} monad, sparks its evaluation using parallel combinators (section \ref{sec:parcomb}) and substitutes the occurrences of $x$ for the sparked computation $x'$.
   182 \item[Introduce Data Parallelism] takes an expression $l$ which must be a list, replaces it by a {\tt parList}, i.e. a parallel implementation of a list supported by the standard library, and ensures the parallel versions of operations on $l$.
   183 \item[Introduce Thresholding] allows for the definition of limits to the granularity of the aforementioned refactorings. Thresholding requires a designated expression $t$ and a threshold. Calls to the respective function are only evaluated in parallel if $t$ is not lower than the threshold.
   184 \item[Clustering] refers to data parallelism and accepts a minimum chunk size for a parallel list. Whenever the size of a list falls below this value, the sequential versions of operations are used.
   185 \end{description}
   186 \noindent
   187 For other refactorings, examples and a discussion of performance gains and overheads please refer to \cite{brown2012paraforming}. \textit{ParaForming} is very specific to \textit{Haskell} and its refactorings are relatively straight forward. However, they avoid common pitfalls like forgetting to substitute the use of an expression. Decisions on granularity are not made by the system, but their implementation is automated as much as possible.
   188 
   189 \subsubsection{Wrangler}
   190 In 2013, Brown et al. documented their work on an extension for a refactoring tool for Erlang, named {\em Wrangler} \cite{brown2013cost,li2006comparative}. The provided refactorings make use of skeletons (section \ref{sec:algoskel}) and thereby facilitate informed decisions on which refactorings and skeletons to use based on cost modeling.
   191 
   192 % paraforming
   193 % based on rewrite rules applied to abstract syntax tree (AST)
   194 % refactoring is function taking a list possible rewrite rules
   195 % rewrite rules have conditions for nodes which they can be applied to
   196 
   197 % futurework: standard ml implementation
   198 % "A parallel SML compiler based on algorithmic skeletons"