NAP      


   Networked Associative Pipes                Rocketcalc, LLC (Kent, Ohio, USA)
   Nap's a pipe.

Tutorial: Parallel programming with shell scripts

The Sieve of Eratosthenes is a widely-studied algorithm for generating prime numbers. The algorithm is easy to understand and can be implemented in many interesting ways, some of which are computationally very efficient. This tutorial presents a few elementary sequential and parallel implementations of the algorithm written as Bash shell scripts. The tutorial is designed to illustrate that parallel programming can be easy.

The Sieve of Eratosthenes

Eratosthenes algorithm for computing primes less than a given number N is vey straightforward:
  1. Make a list of the numbers from 2, 3, ..., up to N.
  2. Start at 2. Cross off all multiples of two (not including two).
  3. Move on to the next number in the list not yet crossed off and cross off all its multiples (not including itself).
  4. Repeat the last step until you exceed the square root of N.
The numbers left in the list that have not been crossed out after this procedure are the primes between 2 and N. Figure 1 below illustrates the idea for the numbers from 2 to 31.

Figure 1: The Sieve of Eratosthenes


Listing 1 illustrates a basic implementation of this algorithm using the Bash shell scripting language. The script takes N as its argument and returns a list of primes less than N. The script uses the bc calculator to compute the square root of N. The seq command is used to set up the array P with the list of numbers from 2 to N. The script then exactly follows the above algorithm in the body of the while loop. The Bash unset command clears the corresponding array entry. The script finally prints the contents of the array, which by the end of the script only contains primes.
#!/bin/bash
# basic_sieve
# The Sieve of Eratosthenes (basic version)

M=$1
N=$(echo "scale=0;sqrt($M)+1" | bc)
P=( $(seq 2 $M) )

j=0
while test $j -le $N; do
  if test -n "${P[$j]}"; then
    k=$j
    until test $k -gt $M; do
      let k+=${P[$j]}
      unset P[$k]
    done
  fi  
  let j=$j+1
done  
echo ${P[*]}
Listing 1: basic_sieve (download it here)


Observations and minor optimizations

Several observations on Figure 1 and Listing 1 are in order:
  1. Other than 2, we know that no even number is prime.
  2. We don't need to strike off all multiples of each prime p, only those greater than or equal to p×p. Smaller multiples will have already been crossed off in an earlier iteration.
We can incorporate these observations into our code to improve the performance. Omitting even numbers from consideration is only practical if we can easily generate an array of odd numbers. Fortunately, the seq command can do this. We can implement observation 2 by simply adjusting the starting position in the loop that strikes out entries.

Listing 2 below presents a modified version of the program that incorporates the above optimizations. The slight complication in indexing introduced by skipping the even numbers is the price of optimization.

#!/bin/bash
# sieve
# The Sieve of Eratosthenes  (better version)
# Save this file as 'sieve' and run, for example, as:
# ./sieve 100

[ $# -ne 1 ] && { echo "Usage: sieve "; exit 1; }
M=$(($1 / 2))
N=$(echo "scale=0;sqrt($1)/2+1" | bc)

# Skip the even numbers; P is an array of all odd numbers up to $1. 
P=( $(seq 3 2 $1) )

j=0
while test $j -le $N; do
# If P[j] is non-empty, then it's a prime. Remove its multiples...
  if test -n "${P[$j]}"; then
# ...starting from P[j]^2, as smaller multiples will have already been removed by an earlier iteration
    k=$(((${P[$j]} * ${P[$j]})/2 - 1))
    while test $k -le $M; do
      unset P[$k]
      let k+=${P[$j]}
    done
  fi  
  let j+=1
done  
echo 2 ${P[*]}
Listing 2: Better sieve (download it here)

Parallel implementation for multicore/SMP machines

Most of the work done in the sieve programs listed above occurs in the inner loop. If multiple processors are available, some of the work in that loop can be divided up among the processors and computed in parallel. A so-called master/slave approach can be used to divide the work up among NP processors. The method is illustrated by the pseudocode in Listing 3 and graphically in Figure 2 below.
INPUT integer N and number of programs to run NP

Form NP disjoint sequences P(1), P(2),..., P(NP) that cover the integers 
from 2, 3, ..., N. Make sure that the first list P(1) contains numbers up
to at least floor(square root of N). For example, say N=31 and NP=3. One 
possible division is:
P(1)={2, 3, 4, 5, 7}
P(2)={8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19}
P(3)={20,21,22, 23, 24, 25, 26, 27, 28, 29, 30, 31}


Start one master process using the first list P(1).

Start NP-1 slave processes using data P(2), ..., P(NP), respectively.

MASTER PROGRAM
--------------
1 Run the standard Sieve of Eratosthenes on our portion of the list. Each time
  a prime less than the square root of N is found, send it to all of the slave
  processes.

2 Send an exit code to each slave process when our sieve algorithm is done.

3 For j=2...NP

4   Collect and print the primes from process j

5 End for

SLAVE PROGRAM
-------------
1 Loop

2   Wait for a new prime P or exit code

3   If P is an exit code, exit loop

4   Strike each multiple of P from our list

5 End loop

6 Send the remaining numbers on our list to the master process


Listing 3: Psuedocode of a master/slave parallel sieve

Figure 2: Illustration of the master/slave implementation

Figure 2 illustrates that the slave processes may not all run in synch. Potential delays occur in this scheme whenever a slave waits for a new prime, and when the master program waits for the slaves to deliver their primes at the end.

It is relatively easy to implement this method using NAP and Bash. NAP's non-blocking output pipe (available only for small messages) can be used by the master process to output a list of primes up to the square root of N to each slave process. If the master's data list is shorter than the slave's lists, then it is likely that the master will issue primes faster than the slaves can use them, avoiding delays in the slave program.

Listing 4 below shows the parallel master/slave sieve using NAP and Bash. The program takes two arguments, N and the number of slave processes. The program starts the slave processes locally, suitable for use on an SMP machine. Most of the added complexity in the program results from maintaining proper indexing for each sub-list of integers.

#!/bin/bash
# psieve
# Sieve of Eratosthenes (master/local slaves)
#
# Example:
# ./psieve 100 2
#
# --------------------------------------------------------------------------
# Slave process
# --------------------------------------------------------------------------
if test $# -eq 0; then
  p=( $(nin partition) )
  P=( $(seq ${p[0]} 2 ${p[1]}) )
  n=${#P[*]}
  while true; do
# Read the next prime
    m=$(nin "P${p[0]}")
    let mm=$m*$m
# Zero means we're done
    [ $m -eq 0 ] && break; 
# Determine the next largest odd multiple of $m in our range:
    if test $(( ${p[0]} % $m )) -eq 0; then
      k=${p[0]}
    else
      k=$(( ((${p[0]} + $m - ${p[0]} % $m) / $m) * $m ))
      k=$(( $k + (($k % 2 + 1) % 2) * $m ))
    fi
    [ $k -lt  $mm ] && k=$mm;
# Convert to an index
    k=$(( ($k - ${p[0]}) / 2 ))
    [ $k -gt $n ] && continue;
    while test $k -lt $n; do
      unset P[$k]
      let k+=$m
    done
  done
# Report the results from this partition
  echo ${P[*]} | nbout "Q${p[0]}"
  exit 0
fi

# --------------------------------------------------------------------------
# Master process
# --------------------------------------------------------------------------
N=$(echo "scale=0;sqrt($1)/2+1" | bc)
h=$(($1 / $2))
n=$(($h + $h % 2 - 1))
n2=$(($n / 2))
h=$(($n + 1))
k=$(($n + 2))

# First partition must contain sqrt(number)
[ $n -lt $N ] && echo "Too many partitions" && exit 1

# Set up the partitions, including one to run on the master
P=( $(seq 3 2 $n) )
c=0;
while test $k -lt $1; do
  l=$(($k + $h))
  [ $l -gt $1 ] && l=$1
  echo "$k $l"    | nbout partition
  W[$c]=$k
  let k=$l+2
  let c=$c+1
done
c=$(seq 0 $(($c - 1)))

# Start up $c copies of the slave process (locally)
for i in $c; do
  $0 &
done

# Run the sieve
j=0
while test $j -le $N; do
  if test -n "${P[$j]}"; then
# Send the next prime to each slave
    for k in $c; do
      echo ${P[$j]} | nout "P${W[$k]}"
    done
    k=$(((${P[$j]} * ${P[$j]})/2 - 1))
    until test $k -gt $n2; do
      unset P[$k]
      let k+=${P[$j]}
    done
  fi
  let j+=1
done

# Instruct the slaves to print results and exit
for k in $c; do
  echo 0 | nbout "P${W[$k]}"
done

# Collect and print results in order
echo -n 2 ${P[*]}  				
echo -n " "
for k in $c; do
  echo -n "$(nin "Q${W[$k]}") "		
done
echo
Listing 4: Parallel sieve (psieve) (download it here)

Parallel implementation for clusters

NAP pipes work transparently over networks. This feature makes it very easy to adapt the SMP parallel program displayed in Listing 4 to work on distributed cluster computers.

Listing 5 shows a modified version of the SMP program of Listing 4 for use with clusters and the "ssh" command. Instead of starting slave processes locally, the modified program starts the slave processes on a list of local and/or remote nodes using the ssh command. The syntax for running the new program is:

./psieve.ssh N NP node1 node2 node3 ... node_k

The number of processes NP and the length of the listed nodes need not match. If the number of processes NP is less than the length of the node list, then the processes will be launched on the first NP nodes in the list. If the number of processes NP is greater than the length of the node list, then processes will be launched in order on a round-robin fashion on the nodes in the list.

#!/bin/bash
# psieve.ssh
# Sieve of Eratosthenes (master/slave ssh version)
#
# Example:
# (make sure nap is running)
# ./psieve.ssh 1000 5 n2 n3 n4 n1
#
# The example computes the primes less than 1000 using five processes.
# The first process runs locally as the master. The remaining four are
# run on nodes n2 n3 n4 and n1. 

# --------------------------------------------------------------------------
# Slave process
# --------------------------------------------------------------------------
if test $# -eq 0; then
  p=( $(nin partition) )
  P=( $(seq ${p[0]} 2 ${p[1]}) )
  n=${#P[*]}
  while true; do
# Read in the next prime
    m=$(nin "P${p[0]}")
    let mm=$m*$m
# Zero is the exit code
    [ $m -eq 0 ] && break; 
# Determine the next largest odd multiple of $m in our range:
    if test $(( ${p[0]} % $m )) -eq 0; then
      k=${p[0]}
    else
      k=$(( ((${p[0]} + $m - ${p[0]} % $m) / $m) * $m ))
      k=$(( $k + (($k % 2 + 1) % 2) * $m ))
    fi
    [ $k -lt  $mm ] && k=$mm;
# Convert to an index in the array
    k=$(( ($k - ${p[0]}) / 2 ))
    [ $k -gt $n ] && continue;
    while test $k -lt $n; do
      unset P[$k]
      let k=$k+$m
    done
  done
  echo ${P[*]} | nbout "Q${p[0]}"
  exit 0
fi


# --------------------------------------------------------------------------
# Master process
# --------------------------------------------------------------------------

M=$1
C=$2

# Collect the list of nodes
declare -a NODE
shift
shift
np=0
while test "$#" -ge 1; do
  NODE[$np]=$1
  let np+=1
  shift
done

N=$(echo "scale=0;sqrt($M)/2 + 1" | bc)
h=$(($M / $C))
n=$(($h + $h % 2 - 1))
n2=$(($n / 2))
h=$(($n + 1))
k=$(($n + 2))

# The first partition must contain sqrt(M)
[ $n -lt $N ] && echo "Too many partitions" && exit 1

# Set up the partitions, including one to run on the master
P=( $(seq 3 2 $n) )
c=0;
while test $k -lt $M; do
  l=$(($k + $h))
  [ $l -gt $M ] && l=$M
  echo "$k $l"    | nbout partition
  W[$c]=$k
  let k=$l+2
  let c=$c+1
done
c=$(seq 0 $(($c - 1)))

# Start up the slave processes on the listed nodes
for i in $c; do
  j=$(($i % $np))
  echo ssh -f ${NODE[$j]} $(pwd)/$0
  ssh -f ${NODE[$j]} $(pwd)/$0
done

# Run the sieve
j=0
while test $j -le $N; do
  if test -n "${P[$j]}"; then
# Send the next prime to each slave ($c of them)
    for k in $c; do
      echo ${P[$j]} | nout "P${W[$k]}"
    done
    k=$(((${P[$j]} * ${P[$j]})/2 - 1))
    until test $k -gt $n2; do
      unset P[$k]
      let k+=${P[$j]}
    done
  fi
  let j+=1
done

# Instruct the slaves to print results and exit
for k in $c; do
  echo 0 | nbout "P${W[$k]}"
done

# Collect and print results in order
echo -n 2 ${P[*]}  				
echo -n " "
for k in $c; do
  echo -n "$(nin "Q${W[$k]}") "		
done
echo
Listing 5: Parallel sieve for clusters (psieve.ssh) (download it here)

Coda: Recusive implementations of the sieve

Many recursive implementations of the sieve of Eratosthenes have been devised, although they are probably not what Eratosthenes had in mind. The recursive versions are often short and elegant in their own way, but typically computationally inefficient. For completeness, Listing 6 illustrates a recursive implementation of the sieve written in the Bash scripting language.
#!/bin/bash
# Find primes less than N with, e.g.,: seq 2 N | ./rsieve
read p
[ -z "$p" ] && exit
echo "$p"
while true; do
    read q
    [ -z "$q" ] && exit
    [ $(($q % $p)) != 0 ] && echo $q
done | $0 
Listing 6: Recursive sieve

The recursive sieve is essentially already parallel; a new process is started for each prime. NAP can be used to easily extend this parallelism across a network, but you will find that the performance is lacking compared to the previous implementations.