wneuper/isa: doc-isac/mlehnfeld/master/thesis/funproglangs_mcsystems.tex@8e3f73e1e3a3 (annotated)

s1210629013@55466	1	\chapter{Parallelism in Functional Programming}
s1210629013@55404	2	\label{cha:funproglangs_mcsystems}
s1210629013@55466	3	This section presents several concepts for parallelism and concurrency which have been suggested for and implemented in functional programming languages. The central idea here is the philosophy of the declarative paradigm: The developer should not have to worry about {\em how} the parallel execution needs to be organized but at most make informed decisions on {\em what} has to be parallelized. Ideally, process management details such as process creation, resource assignment and synchronization should be taken care of by compiler and run-time system. In practice, various properties of languages such as impureness and monadic I/O, may allow for language constructs from imperative programming or even require forms of explicit synchronization.
s1210629013@55466	4
neuper@55476	5	Many of the mechanisms presented in the following sections have been implemented in and modified for certain programming languages. It seemed appropriate to place details on these realization details and changes in the respective sections. Table \ref{tab:lang_par_overview} gives a brief overview. Note that it only contains those mechanisms and sections that mention concrete implementations.
s1210629013@55466	6	\begin{table}[ht]
s1210629013@55466	7	\centering
s1210629013@55466	8	\caption{Overview of parallelism mechanisms and referenced programming languages.}
s1210629013@55466	9	\label{tab:lang_par_overview}
s1210629013@55466	10	\def\rr{\rightskip=0pt plus1em \spaceskip=.3333em \xspaceskip=.5em\relax}
s1210629013@55466	11	\setlength{\tabcolsep}{1ex}
s1210629013@55466	12	\def\arraystretch{1.20}
s1210629013@55466	13	\setlength{\tabcolsep}{1ex}
s1210629013@55466	14	\small
s1210629013@55466	15	\begin{tabular}{\|p{0.33\textwidth}\|p{0.36\textwidth}\|c\|}
s1210629013@55466	16	\hline
s1210629013@55466	17	\multicolumn{1}{\|c}{\em Concept} &
s1210629013@55466	18	\multicolumn{1}{\|c}{\em Language} &
s1210629013@55466	19	\multicolumn{1}{\|c\|}{\em Section} \\
s1210629013@55466	20	\hline\hline
s1210629013@55466	21	{\rr Annotations} & {\rr \textit{Clean}} & \ref{sec:annot} \\
s1210629013@55466	22	\hline
s1210629013@55466	23	{\rr Futures and Promises} & {\rr \textit{Multilisp}, \textit{Isabelle/ML}, \textit{Scala}} & \ref{sec:futurespro}, \ref{sec:actors_scala}, \ref{sec:csp} \\
s1210629013@55466	24	\hline
s1210629013@55466	25	{\rr Parallel Combinators} & {\rr \textit{Haskell}} & \ref{sec:parcomb} \\
s1210629013@55466	26	\hline
s1210629013@55466	27	{\rr Algorithmic Skeletons} & {\rr \textit{Java}} & \ref{sec:algoskel} \\
s1210629013@55466	28	\hline
s1210629013@55466	29	{\rr Data Parallelism} & {\rr \textit{Haskell}} & \ref{sec:data_parallelism} \\
s1210629013@55466	30	\hline
s1210629013@55466	31	{\rr Soft\-ware Transactional Mem\-o\-ry} & {\rr \textit{C}, \textit{Haskell}, \textit{Java}, \textit{JavaScript}, \textit{Python}, \textit{Scala}} & \ref{sec:stm} \\
s1210629013@55466	32	\hline
s1210629013@55466	33	{\rr Actors} & {\rr \textit{Erlang}, \textit{Scala}} & \ref{sec:actormodel} \\
s1210629013@55466	34	\hline
s1210629013@55466	35	{\rr Com\-mu\-ni\-cat\-ing Sequential Pro\-ces\-ses} & {\rr \textit{occam}, \textit{Limbo}, \textit{Go}} & \ref{sec:csp} \\
s1210629013@55466	36	\hline
s1210629013@55466	37	\end{tabular}
s1210629013@55466	38	\end{table}
s1210629013@55466	39	For consistency reasons, all sample code is provided in \textit{Standard ML}, even where the demonstrated concepts are not actually available in the language.
s1210629013@55466	40	%NOTES: IMPLICIT, CONTROLLED, EXPLICIT; TRANSFORMATIONAL VS. REACTIONAL;\\
s1210629013@55466	41	%MISSING: EVALUATION STRATEGIES (http://hackage.haskell.org/package/parallel-3.1.0.1/docs/Control-Parallel-Strategies.html), ACTIVE DATA STRUCTURES, PIPELINE PARALLELISM
s1210629013@55404	42
s1210629013@55404	43	\section{Implicit Parallelism}
s1210629013@55404	44	\label{sec:implicit_parallelism}
s1210629013@55466	45	Due to the lack of side effects, it should be possible to compute independent subexpressions in parallel. This means that in theory, purely functional programs are parallelizable automatically by the compiler. However, the whole process is not trivial as section \ref{sec:par_funprog} already pointed out. A compiler that supports implicit parallelization must be able to perform granularity and cost analyses in order to achieve advantageous parallelizations and avoid those whose overhead would exceed their potential gain because they are too fine-grained. Additionally, compilers for lazy languages have to incorporate a strictness analysis, i.e. identify expressions which must necessarily be evaluated, such that they do not introduce an overhead by computing expressions whose results are not needed. An example for a function with a strict argument is the {\tt map} operation. If its list argument is undefined, the result of the whole function call will be undefined.
s1210629013@55466	46
s1210629013@55466	47
s1210629013@55466	48	\section{Task Parallelism}
s1210629013@55466	49	\label{sec:task_par}
s1210629013@55466	50	Instead of automating parallelization, functional general-purpose languages such as \textit{Haskell} and \textit{Clean} provide special syntax to indicate that expressions are to be evaluated in parallel and thereby make theoretically implicit parallelism explicit.
s1210629013@55466	51
s1210629013@55466	52	\subsection{Parallel {\tt let}-Expressions}
s1210629013@55466	53	One proposed method are parallel {\tt let}-expressions \cite{hammer2007proposal}. {\tt let}-expressions are a common way in functional languages to allow the programmer to declare a list of values required for the evaluation of one final expression. The modified version {\tt letpar} has been suggested for the indication of a list of independent values which could potentially be computed in parallel. While this keyword is not available in \textit{Standard ML} a hypothetical use could look like the expression
s1210629013@55466	54	\imlcode{~~\=le\=tpar\\
s1210629013@55466	55	\>\>val x = add2 3\\
s1210629013@55466	56	\>\>val y = fib 8\\
s1210629013@55466	57	\>\>val z = add 7 5\\
s1210629013@55466	58	\>in x + y + z end; \textrm{.}}
s1210629013@55466	59
s1210629013@55466	60	\subsection{Annotations}
s1210629013@55466	61	\label{sec:annot}
s1210629013@55466	62	Another straight forward solution is the use of annotations. This is probably the simplest way of expressing parallelism because it does not affect the semantics of a program but only its runtime behavior. By ignoring the annotations the program can then still be compiled sequentially. This method has been implemented in the programming language \textit{Clean} \cite{nocker1991concurrent} which also allows indications of the target processing unit for the evaluation. Since eager languages evaluate a function's arguments before invoking the function itself, this mechanism, in its simple form, only works for lazy languages. A potential implementation of the parallel {\tt fib} function\footnote{see comments in appendix \ref{sec:fib-imp}} in a lazy version of \textit{Standard ML} could look somewhat like the definition
s1210629013@55466	63	\imlcode{\label{alg:parannot}
neuper@55476	64	~~\=fu\=n fib 0 = 0\\
s1210629013@55466	65	\>\>\| fib 1 = 1\\
s1210629013@55466	66	\>\>\| fib x = fib (x - 1) + par (fib (x - 2)); \textrm{.}}
s1210629013@55466	67	\noindent
s1210629013@55466	68	A similar construct for eager languages is slightly more involved and will be discussed in the following section.
s1210629013@55466	69
s1210629013@55466	70	\subsection{Futures and Promises}
s1210629013@55466	71	\label{sec:futurespro}
s1210629013@55466	72	The concepts of {\em promises} and {\em futures} were first suggested in the mid 1970s \cite{baker1977incremental}. The idea is similar to annotations (see previous section). One of the first languages that adopted futures was called {\em Multilisp} \cite{halstead1985multilisp}. As the {\em Isabelle} theorem prover provides its own implementation of futures \cite{matthews2010efficient} which have been introduced to \isac{} during the parallelization process (section \ref{ssec:session-manag}), they are of special interest for this thesis. Futures are objects representing the result of an expression which may not be known initially because its computation is managed by the run-time system and may occur in parallel. {\em Implicit} futures are usually an integral part of a language and treated like ordinary references. If results are not available the first time they are requested, program execution stops until the future value has been obtained. In contrast, {\em explicit} futures require programmers to manually enforce their computations before they can access their results. {\em Isabelle}'s solution follows the latter approach and will be discussed in more detail in section \ref{sec:isafut}. The definition
s1210629013@55466	73	\imlcode{\label{alg:futures}
neuper@55476	74	~~\=fu\=n \=fi\=b \=0 = 0\\
s1210629013@55466	75	\>\>\| fib 1 = 1\\
s1210629013@55466	76	\>\>\| fib x = let\\
s1210629013@55466	77	\>\>\>\>\>fun fib' () = fib (x - 2)\\
s1210629013@55466	78	\>\>\>\>\>val fibf = Future.fork fib'\\
s1210629013@55466	79	\>\>\>\>in fib (x - 1) + (Future.join fibf) end;}
s1210629013@55466	80	\noindent
s1210629013@55466	81	demonstrates the use of futures in \textit{Isabelle/ML}\footnote{see comments in appendix \ref{sec:fib-imp}}. The function \texttt{Future.fork} accepts one argument which must be a function of type \texttt{unit -> 'a}, i.e. one which accepts an empty argument ``\texttt{()}'' and returns an arbitrary type. \texttt{Future.join} requires a future value as its argument and returns the result contained in it.
s1210629013@55466	82
s1210629013@55466	83	As with general evaluation strategies, futures can be determined eagerly or lazily, i.e. their computation can be started on creation of the datastructure or on demand. They are read-only placeholders. In contrast, promises are the single assignment containers which hold the future's value. Futures and promises are closely linked to Communicating Sequential Processes (section \ref{sec:csp}) and also to the Actor model (section \ref{sec:actors_scala}) and are therefore available e.g. in \textit{Scala} \cite{akka2014futures}. Although their adoption originated in functional programming, their use is not limited to this paradigm.
s1210629013@55466	84	%http://en.wikipedia.org/wiki/Task_parallelism, Scala, Isabelle
s1210629013@55466	85
s1210629013@55466	86	\subsection{Parallel Combinators}
s1210629013@55466	87	\label{sec:parcomb}
s1210629013@55466	88	\textit{Haskell}'s approach to task parallelism is the use of parallel combinators. Its run-time system manages a queue of tasks, so called {\em sparks}, whose evaluation occurs as soon as an idle CPU is detected. The language provides the keyword {\tt par}, which accepts two arguments, ``sparks'' the evaluation of the first and returns the computed result of the second. For more details see e.g. \cite{marlow2009runtime}.
s1210629013@55466	89	%CAN BE EXTENDED
s1210629013@55466	90
s1210629013@55466	91	\subsection{Algorithmic Skeletons}
s1210629013@55466	92	\label{sec:algoskel}
neuper@55476	93	The idea of algorithmic skeletons is simple and not limited to parallelism. Most parallel algorithms show common patterns. These patterns can be abstracted into higher-order functions. A well known example is {\em divide-and-conquer}:
s1210629013@55466	94	\imlcode{~~\=fu\=n \=da\=c \=is\=\_trivial solve divide conquer x =\\
s1210629013@55466	95	\>\>\>if is\_trivial x then\\
s1210629013@55466	96	\>\>\>\>solve x\\
s1210629013@55466	97	\>\>\>else\\
s1210629013@55466	98	\>\>\>\>divide x\\
s1210629013@55466	99	\>\>\>\>\>\|> map (dac is\_trivial solve divide conquer)\\
s1210629013@55466	100	\>\>\>\>\>\|> conquer; \textrm{,}}
s1210629013@55466	101	\noindent
neuper@55476	102	where \texttt{x \|> f} is syntactic sugar for \texttt{f x} and \texttt{is\_trivial} is a function with one argument and a \texttt{bool} return value. In order to parallelize the algorithm we only need to use a parallel version of {\tt map}. Other common patterns include {\em branch-and-bound} and {\em dynamic programming}. While this method is also available for imperative languages such as \textit{Java}, the fact that functional languages treat higher-order functions as first-class citizens makes them particularly suitable for the use of skeletons. Algorithmic skeletons hide the orchestration between sequential portions of code from the programmer. Sometimes the compiler can provide low-level support for the skeletons. Furthermore, multiple primitive skeletons can be composed to produce more powerful patterns. Unlike other high-level parallelism mechanisms, they are apt to cost modeling \cite{hammond2000research}.
s1210629013@55404	103
s1210629013@55404	104
s1210629013@55404	105	\section{Data Parallelism}
s1210629013@55404	106	\label{sec:data_parallelism}
s1210629013@55466	107	% COULD BE EXTENDED (nested parallelism details, active data structures, Scala parallel collections)
s1210629013@55466	108	Many operations require the application of the same function on multiple items of a data set. This can generally occur in parallel. With this kind of algorithms, the parallel portion naturally depends on the problem size. Here the same issues apply that we encountered before: The granularity often is too fine and the overhead exceeds the benefits of parallel processing. Most applications of data parallelism are based on array structures and early implementations specialized in highly parallel, high-performance and scientific computing \cite{hammond2000research}. Recent developments in GPU hardware and the arrival of general-purpose GPU programming thanks to projects such as {\em CUDA} and {\em Accelerator} \cite{tarditi2006accelerator} have fueled data parallelism research during the last few years \cite{nickolls2010gpu}. Distributed, parallel algorithms like Google's {\em MapReduce} should be mentioned here. However, we will not discuss distributed parallelism in this thesis.
s1210629013@55404	109
s1210629013@55466	110	In 1990, Blelloch and Sabot first made a distinction between {\em flat} and {\em nested} parallelism \cite{blelloch90compiling}. Until then, data parallelism implementations had only considered sequential functions for parallel application on data sets. Nested data parallelism is significantly more complex in that it allows the application of functions which themselves are parallel. More than 15 years later, this functionality was suggested and implemented as an extension to the \textit{Haskell Glasgow Compiler} called {\em Data Parallel Haskell} \cite{peytonjones2008harnessing}.
s1210629013@55404	111
s1210629013@55404	112
s1210629013@55404	113	\section{Concurrent Functional Programming}
s1210629013@55404	114	\label{sec:concurrentfunprog}
s1210629013@55466	115	As we saw in section \ref{sec:parallelism}, concurrency describes two or more tasks whose lifespans may overlap while parallelism is capable of exploiting multiple processing units by executing multiple tasks simultaneously. Section \ref{sec:concurresp} outlined how concurrency can be beneficial by hiding latencies and improving responsiveness. Now we want to explore concurrency models and control mechanisms in functional programming.
s1210629013@55466	116	% could further be extended by process calculi
s1210629013@55404	117
s1210629013@55404	118	\subsection{Software Transactional Memory}
s1210629013@55466	119	\label{sec:stm}
neuper@55476	120	While hardware support for transactions had been introduced before and they were a fundamental aspect of fault tolerance in the context of database systems, software-only transactional memory was first suggested by Shavit and Touitou in 1997 \cite{shavit1997software}. Unlike {\em lock-based synchronization} STM generally is a non-blocking strategy. Transactions are sequences of read and write operations on a shared memory. They are tagged atomic blocks, processed optimistically, regardless of other threads potentially performing other alterations to the memory simultaneously. Each access to the memory is logged and once a transaction is finished, a final validation step verifies that no other thread modified any of the elements which were read during the process, i.e. the transaction's view of the memory was consistent. If there was no conflicting write access, the transaction is committed to the memory. Otherwise the whole process is repeated until no more conflicts arise. In order to be correct, transactions must be linearizable, i.e. it must be possible to execute them in a certain sequence without temporal overlaps, such that the resulting state of the shared memory is the same as if the transactions had been carried out concurrently \cite{herlihy1990linearizability}. Because of their non-blocking nature, STMs do not cause deadlocks or priority inversion. They are inherently fault tolerant to a certain extent, because transactions can be undone in the event of a timeout or an exception. Also, they are free from the tradeoffs between lock granularity and concurrency.
s1210629013@55466	121
s1210629013@55466	122	Recent work on {\em commitment ordering} has helped reduce the number of transaction conflicts and thereby improve performance of STM implementations \cite{ramadan2009committing,zhang2010software}. Also, there are various STM solutions that do in fact use locking mechanisms during different phases of transaction. They can potentially reduce the number of rollbacks and consequently also make process coordination faster \cite{ennals2006software}. The significant difference between these approaches and conventional, mutex based thread synchronization mechanisms is the fact that with STM, any locking is invisible to the programmer and taken care of by the run-time system. The downside of STM is the performance overhead caused by transaction management which typically impairs the performance of programs with a low number of threads, such that they are slower than when implemented in a sequential manner. STMs, however, scale particularly well with respect to higher thread and processor numbers \cite{saha2006mcrt}.
s1210629013@55466	123
s1210629013@55466	124	Several implementations for various programming languages including \textit{C}, \textit{Java}, \textit{Scala}, \textit{Python} and even \textit{JavaScript} are available. \textit{Concurrent Haskell} offers multiple language constructs built on the basis of STM to allow for composable, nested, alternative and blocking transactions as well as STM exceptions \cite{harris2005composable}.
s1210629013@55466	125	%An example use of STM in StandardML could look like snippet \ref{alg:stm}.
s1210629013@55466	126	%\begin{MLCode}[alg:stm]{Theoretical Software Transactional Memory use in \textit{Standard ML}}
s1210629013@55466	127	%\end{MLCode}
s1210629013@55466	128
s1210629013@55466	129	%CML similar approach par_conc_ml.pdf
s1210629013@55466	130	%http://channel9.msdn.com/Shows/Going+Deep/Programming-in-the-Age-of-Concurrency-Software-Transactional-Memory
s1210629013@55404	131
s1210629013@55404	132	\subsection{The Actor Model}
s1210629013@55466	133	\label{sec:actormodel}
neuper@55476	134	In 1973, Hewitt, Bishop and Steiger suggested a modular actor formalism \cite{hewitt1973universal} which laid out the basis of what is now known as the {\em Actor model}. Several publications of the following decade formed a comprehensive Actor model theory. The two central concepts of the Actor model are {\em actors} and {\em messages} \cite{agha1985actors}. Actors are entities that can send and receive messages. Their behavior is defined by the {\bf actions} they perform on receipt of a message (e.g. create another actor) and a finite number of other actors they know about, called {\bf acquaintances}. The acquaintance relation is not necessarily bidirectional, i.e. an acquaintance $B$ of actor $A$ must not necessarily know actor $A$. Each actor has an immutable, unique name and a local state. Communication between actors is asynchronous. Incoming messages are processed one at a time and as a response to each message an atomic sequence of actions is performed. Since actors and messages are autonomous, encapsulated entities they are easy to reason about and can be composed into more complex systems \cite{agha1997foundation}. The order of actions affects an actor's external behavior. But because messaging occurs asynchronously, the processing order of messages is undefined. This leads to non-determinism in computations based on the actor system. There are certain abstractions available which allow for constraints on message ordering and can therefore eliminate part of the non-determinism \cite{karmani2009actor}. They can be summarized in four important properties:
s1210629013@55466	135	\begin{enumerate}
s1210629013@55466	136	\item {\bf Encapsulation.} Unlike shared memory thread models, actors are highly encapsulated. Therefore, they provide no locking or synchronization mechanisms for shared resources. Their internal states are not directly accessible by other actors. The only way for them to exchange data or influence each other is by means of messages. This limitation, however, is violated by some implementations in languages such as \textit{Scala} \cite{thurauakka}. In programming languages which permit mutable data, messages should, in theory, not transfer references to locations in a shared address space. Otherwise the receiving party could modify memory locations that are contained in another actor's state representation. However, \textit{Java}'s actor library {\em Akka} allows actors to do just that and it is the programmer's responsibility to maintain encapsulation.
s1210629013@55466	137	\item {\bf Fairness.} Another important property is fairness with respect to actor sched\-ul\-ing: a message will be delivered to its target actor eventually, i.e. unless an actor dies or displays faulty behavior, it will be scheduled in a fair manner.
s1210629013@55466	138	\item {\bf Location Transparency.} This property demands that an actor's unique name is independent of its location, i.e. its assigned CPU and memory can potentially belong to a remote machine and even be migrated at run time.
s1210629013@55466	139	\item {\bf Mobility.} The ability for this migration can enable load balancing and improved fault tolerance. However, implementations do not necessarily satisfy these properties (e.g. \textit{Scala}). {\em Weak mobility} allows code or initial states to be moved, while {\em strong mobility} additionally supports execution state migration \cite{fuggetta1998understanding}. It has been shown that the presence of mobility can significantly improve an architecture's scalability \cite{panwar1994methodology}.
s1210629013@55466	140	\end{enumerate}
s1210629013@55466	141	To demonstrate the use of a theoretical actor library for \textit{Standard ML}, let us define a trivial function \texttt{greet} and execute it in the background:
s1210629013@55466	142	\imlcode{~~\=fun greet name = print ("Hello " \^~name \^~"!\textbackslash{}n");\\
s1210629013@55466	143	\>val greeter = Actor.create greet;\\
s1210629013@55466	144	\>Actor.send greeter "John Doe";}
s1210629013@55404	145
s1210629013@55466	146	\subsubsection{\textit{Scala}}
s1210629013@55466	147	\label{sec:actors_scala}
neuper@55476	148	\textit{Scala} is a programming language for the \textit{Java Virtual Machine} that combines concepts of object-oriented and functional programming \cite{odersky2004overview}. Its distribution comes with a toolkit for actors and message passing functionality called {\em Akka}. \textit{Scala}'s standard actors library up to version 2.9.2 \cite{scala2014migration} has now been deprecated and replaced by \textit{Akka}. Its implementation is based on \textit{Erlang} like, fault tolerant actors and its execution model follows an {\em event-based} approach, thus introducing {\em inversion of control}. Compared to \textit{Scala}'s original actors, \textit{Akka} actors are more performant, fault tolerant and offer automatic load balancing \cite{tasharofi2013scala}. They draw on the language's functional programming features such as partial functions and, like \textit{Erlang}, pattern matching which are convenient for message passing. Actors potentially provide safer concurrency than traditional shared memory mechanisms because by design, no race conditions can occur. Also, actors are more lightweight than threads.
s1210629013@55466	149	% why no race conditions!?!?!?
s1210629013@55404	150
s1210629013@55466	151	\subsubsection{Futures}
s1210629013@55466	152	As mentioned in section \ref{sec:futurespro}, futures are related to the Actor model. \textit{Akka} provides a {\tt Future} class whose constructor accepts a block ({\tt \{...\}}) $b$ and returns immediately with a future value which is a placeholder for the computation's result. $b$ is then executed asynchronously by a thread pool managed by the run-time environment \cite{akka2014futures}. Instead of creating futures directly, \textit{Akka}'s actors have a method {\tt ?} that one can use to send a message. The method then returns a future value whose calculation can be synchronized and retrieved using {\tt Await.result}. There are further functions for composition, combination or ordering of futures as well as exception and error handling. The library also supports direct access to {\tt Promises}, the containers that complete futures.
s1210629013@55404	153
s1210629013@55466	154	\subsection{Communicating Sequential Processes}
s1210629013@55466	155	\label{sec:csp}
s1210629013@55466	156	CSP (Communicating Sequential Processes) belong to the family of {\em process calculi} and are another model of concurrency. Process calculi are of special interest for functional programming in that they support a declarative notion of concurrency. CSP were first proposed in an acclaimed paper by Hoare in 1978 \cite{hoare1978communicating} and originally designed for reasoning about concurrent processes. Like the Actor model they utilize message passing for communication. Implementations include {\em occam} \cite{INMOS84occam}, {\em Limbo} \cite{dorward1997programming} and {\em Go} by Google \cite{pike2012go}. The main difference from the Actor model is the use of a concept named {\em channels} in CSP. Actors communicate directly with each other while CSP send and receive messages on directed channels. Another distinction is the fact that in the original CSP model, messages are delivered synchronously and instantaneously. This means that when a process sends a message to a channel, it waits until another process has received it and the receiving party blocks until there is a message available on the channel. Processes are anonymous and need to establish channels between each other in order to be able to communicate which involves a rendezvous, i.e. they need to exchange receiving and sending ends of channels to allow them to communicate. While the basic proposal suggested point-to-point channels, this can also be extended to allow multiple processes to know and use a channel's sending and/or receiving end. The developers of {\em Go} have chosen an approach that differs slightly from the original CSP model: Channels are asynchronous and buffered. They are declared to transmit a specific da\-ta\-type. Interestingly, channels are first class values. As a consequence, they can also be transmitted via channels. Using \textit{Poly/ML}'s thread library, we can utilize a supposed CSP library to achieve the same behavior we already saw with actors and invoke \texttt{greet} asynchronously:
s1210629013@55466	157	\imlcode{\label{alg:csp}
s1210629013@55466	158	~~\=fun greet name = print ("Hello " \^~name \^~"!\textbackslash{}n");\\
s1210629013@55466	159	\>val (c\_in, c\_out) = Channel.create () : string channel;\\
s1210629013@55466	160	\>fun greet\_csp () = Channel.read c\_out \|> greet;\\
s1210629013@55466	161	\>Thread.fork (greet\_csp, []);\\
s1210629013@55466	162	\>Channel.write c\_in "John Doe";}
s1210629013@55404	163
s1210629013@55466	164	\subsubsection{Futures}
s1210629013@55466	165	CSP, too, can simulate futures (section \ref{sec:futurespro}): A future is a channel that can transmit exactly one element and the according promise is another process that sends the result of a computation to the channel.
s1210629013@55404	166
s1210629013@55466	167	% "Squeak: a Language for Communicating with Mice"
s1210629013@55466	168	% "Newsqueak: a Language for Communicating with Mice"
s1210629013@55466	169	% Go concurrency: http://www.youtube.com/watch?v=3DtUzH3zoFo
s1210629013@55466	170	% slides: http://go-lang.cat-v.org/talks/slides/emerging-languages-camp-2010.pdf
s1210629013@55466	171
s1210629013@55466	172	% CCS? pi-calculus?
s1210629013@55466	173	% Clean, Broadcasting Sequential Processes
s1210629013@55466	174	% about 1 page
s1210629013@55466	175
s1210629013@55466	176
s1210629013@55466	177	\section{Refactoring for Parallelism}
s1210629013@55404	178	\label{sec:refac_funprogs4mcsystems}
s1210629013@55466	179	Only few publications have been dedicated to refactorings for parallelism \cite{kennedy1991interactive,wloka2009refactoring,dig2011refactoring}, even less so in the context of functional programming. The {\em ParaForming} tool for \textit{Haskell} \cite{brown2012paraforming} appears to be the first attempt, followed by very recent work on \textit{Erlang} \cite{brown2013cost} and language independent approaches \cite{hammond2013paraphrase,brown2012language}. Due to the trend towards multi-, many- and even megacore machines, modern programming languages should support parallelism by design because the introduction of parallelism should not occur as an afterthought during later stages of the software development cycle. \textit{ParaForming} enforces a separation of concerns between business logic and implementation details usually taken care of by system programmers. It offers several additional refactorings for \textit{HaRe} (section \ref{sec:refacfunprogs}) to support efficient, coarse-grained task and data parallelism.
s1210629013@55466	180	\begin{description}
s1210629013@55466	181	\item[Introduce Task Parallelism.] This refactoring takes an expression $x$ from a function, embeds $x$ into an {\tt Eval} monad, sparks its evaluation using parallel combinators (section \ref{sec:parcomb}) and substitutes the occurrences of $x$ for the sparked computation $x'$.
s1210629013@55466	182	\item[Introduce Data Parallelism] takes an expression $l$ which must be a list, replaces it by a {\tt parList}, i.e. a parallel implementation of a list supported by the standard library, and ensures the parallel versions of operations on $l$.
s1210629013@55466	183	\item[Introduce Thresholding] allows for the definition of limits to the granularity of the aforementioned refactorings. Thresholding requires a designated expression $t$ and a threshold. Calls to the respective function are only evaluated in parallel if $t$ is not lower than the threshold.
s1210629013@55466	184	\item[Clustering] refers to data parallelism and accepts a minimum chunk size for a parallel list. Whenever the size of a list falls below this value, the sequential versions of operations are used.
s1210629013@55466	185	\end{description}
s1210629013@55466	186	\noindent
s1210629013@55466	187	For other refactorings, examples and a discussion of performance gains and overheads please refer to \cite{brown2012paraforming}. \textit{ParaForming} is very specific to \textit{Haskell} and its refactorings are relatively straight forward. However, they avoid common pitfalls like forgetting to substitute the use of an expression. Decisions on granularity are not made by the system, but their implementation is automated as much as possible.
s1210629013@55404	188
s1210629013@55466	189	\subsubsection{Wrangler}
s1210629013@55466	190	In 2013, Brown et al. documented their work on an extension for a refactoring tool for Erlang, named {\em Wrangler} \cite{brown2013cost,li2006comparative}. The provided refactorings make use of skeletons (section \ref{sec:algoskel}) and thereby facilitate informed decisions on which refactorings and skeletons to use based on cost modeling.
s1210629013@55404	191
s1210629013@55466	192	% paraforming
s1210629013@55466	193	% based on rewrite rules applied to abstract syntax tree (AST)
s1210629013@55466	194	% refactoring is function taking a list possible rewrite rules
s1210629013@55466	195	% rewrite rules have conditions for nodes which they can be applied to
s1210629013@55404	196
s1210629013@55466	197	% futurework: standard ml implementation
s1210629013@55466	198	% "A parallel SML compiler based on algorithmic skeletons"

author	Walther Neuper <neuper@ist.tugraz.at>
	Wed, 23 Jul 2014 14:32:19 +0200
changeset 55476	8e3f73e1e3a3
parent 55466	55c2d2ee3f92
child 60710	21ae85b023bb
permissions	-rw-r--r--