Pyre Primer
An Introduction to Parallel Scientific Programming with Python And Pyre. Patrick Hung (please send feedback to patrickh at caltech.edu)
In this tutorial, we will go through a series of example that includes
- MPI Python 101: Computing pi
- Writiting a simple Python bindings for existing software libraries
- MPI Python 102: Computing pi with multiprecision arithmetics
The "hello world" of MPI programming is numerical integration. Many applications of scientific interest can be reduced to this problem.
In our first example, we will follow Section 3.3 of Using MPI and compute Pi = 4*arctan(1) by approximating arctan as an integral:
.
[Obligatory warning: If you actually need the value of Pi to lots of digits, do not use this code.]
The following "C" function
double pi_arctan(int nseg, int myid, int numprocs) {
double h, sum, x;
int i;
h = 1.0 / (double)nseg;
sum = 0.0;
for (i=myid+1; i<=nseg; i += numprocs) {
x = h * ( (double)i - 0.5 );
sum += 4.0 / (1.0 + x*x);
}
return h * sum;
}
accomplishes the following
- Domain decomposition: the interval [0, 1] is uniformly divided into nseg pieces
- Elementary load balancing: For any nseg and any numprocs > 0, the number of segments allocated to any processors are within 1 of any other.
- Uniprocessor/serial case corresponds to binding: pi_arctan_serial(nseg) as pi_arctan(nseg, 0, 1)
We will not worry about using the above function in a "C" driver, an example can be found here: http://www-unix.mcs.anl.gov/mpi/usingmpi/examples/simplempi/main.htm
We will also not worry about calling that function in Python, although writing the Python binding for pi_module that enable the following is trivial. (In fact, can be automated with SWIG)
#import pi_module
def pi_arctan_serial(nseg):
return pi_module.pi_arctan(nseg, 0, 1)
for nseg in [1000, 10000, 100000]:
print "Approximate value of Pi with %d segments: %f" % (nseg, pi_arctan_serial(nseg) )
Instead, we will define the function simplepi in the module _mpn as follows
PyDoc_STRVAR(simplepi_doc, "the hello world of mpi programming, prints Pi !");
static PyObject *
_mpn_simplepi(PyObject *self, PyObject *args)
{
journal::debug_t info("mpn.simplepi");
int nseg, myid, numprocs;
if (! PyArg_ParseTuple(args, "iii", &nseg, &myid, &numprocs) )
return NULL;
double mypi = 0, pi = 0;
if (numprocs == 0) {
// serial code
info << journal::at(__HERE__) << " running in serial mode " << journal::endl;
pi = pi_arctan(nseg, 0, 1);
}
else {
#if defined(PARALLEL) && defined(WITH_MPI)
MPI_Status status;
mypi = pi_arctan(nseg, myid, numprocs);
MPI_Reduce(&mypi, &pi, 1, MPI_DOUBLE, MPI_SUM, 0, MPI_COMM_WORLD);
#endif
}
return PyFloat_FromDouble(pi);
}
The Pyre driver test_simplepi.py is included below
#!/usr/bin/env python
import mpn
from mpi.Application import Application
class SimpleApp(Application):
class Inventory(Application.Inventory):
import pyre.inventory
nseg = pyre.inventory.int("nseg", default=10000)
nseg.meta['tip'] = 'divides the interval [0,1] into how many segments ?'
def main(self):
import mpi
nsegs = self.inventory.nseg
world = mpi.world()
my_pi = mpn._mpn.simplepi(nsegs, world.rank, world.size)
the_pi = 3.14159265358979323846264338327950288
if (world.rank == 0) :
print "Approximate Pi: %.25f. Rel. Err. : %16.8e" % (my_pi, abs(my_pi-the_pi))
return
def __init__(self):
Application.__init__(self, "simple")
return
# main
if __name__ == "__main__":
import journal
journal.debug("mpn.simplepi").activate()
journal.debug("launcher").activate()
app = SimpleApp()
app.run()
# end of file: test_simplepi.py
In the above, we defined our class SimpleApp as an mpi.Application. See the pyre mpi tutorial for additional information.
Running test_simplepi.py at the command-line will result in the following output.
$ test_simplepi.py >> _mpn.cc:277:<unknown> -- mpn.simplepi(debug) -- running in serial mode Approximate Pi: 3.1415926544231340677981734. Rel. Err. : 8.33340952e-10Deactivating the debug output plus a simple change (exercise) will display the computed value of Pi with the incorrect digits parenthesized.
$ test_simplepi.py --nseg=1000 Approximate Pi: 3.141592(73692312268622117699123919010). Rel. Err. : 8.33333296e-08 $ test_simplepi.py --nseg=10000 Approximate Pi: 3.14159265(442313406779817341885063797). Rel. Err. : 8.33340952e-10 $ test_simplepi.py --nseg=100000 Approximate Pi: 3.1415926535(9816064488995834835804999). Rel. Err. : 8.36752889e-12 $ test_simplepi.py --nseg=1000000 Approximate Pi: 3.1415926535897(6336202090351434890181). Rel. Err. : 2.97539771e-14 $ test_simplepi.py --nseg=10000000 Approximate Pi: 3.1415926535897(2872306253520946484059). Rel. Err. : 6.43929354e-14 $ test_simplepi.py --nseg=100000000 Approximate Pi: 3.1415926535(9023454067255443078465760). Rel. Err. : 4.41424675e-13
Running in parallel / MPI mode is easy.
$ test_simplepi.py --launcher.nodes=2 Approximate Pi: 3.1415926544231318473521242. Rel. Err. : 8.33338731e-10With --launcher.nodes=2, the script will be launched in parallel mode. On most platforms, an MPI enabled (i.e., MPI_Init is called at startup, and MPI_Finalize at shutdown) interpreter "mpipython.exe" (as part of the mpi package under Pythia), will be launched instead of the standard python interpreter.
The following will launch the script on your specified set of nodes
$ test_simplepi.py --launcher.nodegen=a%03d --launcher.nodelist=1-3 Approximate Pi: 3.14159265(442312651828160596778616309). Rel. Err. : 8.33333402e-10The command-line argument --launcher.nodelist is a Slice object defined in the module pyre.inventory.properties. The flag "nodegen" instructs the interpreter what string interepolation to use to convert the natural numbers (from nodelist) to machine node names. In this case, the desired translation is 1 -- a001, 2 -- a002, etc. Examining LauncherMPICH.py reveals that the script is to be relaunched with the following command.
mpirun -np 3 -machinefile mpirun.nodes `which mpipython.exe` ./test_simplepi.py \ --launcher.nodelist=1-3 --launcher.nodegen=a%03d --mode=workerBEGIN ASIDE
Now, nodelist appears rather interesting. Let's see more of what it can do. The quickest way to learn about the structure of pyre is with pydoc. Running the following on your workstation will start the pydoc server at port 8888
pydoc -p 8888
You can then use your browser of choice to look through all the modules that are found in the default search paths at http://localhost:8888.
Alternatively, you can fire up the pydoc search GUI with
pydoc -g
Entering Slice into the box reveals that it is defined by pyre.inventory.properties.Slice. Let's start the python interpreter and play with it a little.
>>> from pyre.inventory.properties.Slice import Slice
>>> test = Slice("test")
>>> test._cast('1-3')
[1, 2, 3]
>>> test._cast('5-1')
[5, 4, 3, 2, 1]
>>> test._cast('1-10,5-1')
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 5, 4, 3, 2, 1]
>>> test._cast('1-3,4,5,98-100')
[1, 2, 3, 4, 5, 98, 99, 100]
>>> test._cast('-10-10')
Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "/home/patrickh/dev/products/modules/pyre/inventory/properties/Slice.py", line 32, in _cast
raise TypeError("property '%s': could not convert '%s' to a slice" % (
TypeError: property 'test': could not convert '-10-10' to a slice
We have now seen some of what Slice can, and cannot, do. Note that by calling the member function _cast, with its leading underscore indicating by convention that it is part of the implementation detail, is not part of its standard usage.
END ASIDE
The official way to handle "informative" statements in pyre is to use the journal package. Inside the function launch of LauncherMPICH.py is the following
def launch(self):
args = self._buildArgumentList()
if not args:
return False
command = " ".join(args)
self._info.log("executing: {%s}" % command)
[truncated ]
Looking at the constructor of LauncherMPICH, we see that (in a not so obvious manner) the class will register itself with the string "mpirun." Activating it is as simple as modifyling test_simplepi.py to include the additional line:
if __name__ == "__main__":
import journal
journal.info("mpirun").activate()
app = SimpleApp()
app.run()
Rerunning with journal.info enabled
$ ./test_simplepi.py --launcher.nodelist=31,35,37 --launcher.degen=a%03d
>> /home/patrickh/dev/products/modules/mpi/LauncherMPICH.py:40:launch
-- mpirun(info)
-- executing: {mpirun -np 3 -machinefile mpirun.nodes `which mpipython.exe` ./test_simplepi.py --launcher.nodelist=31,35,37 --launcher.nodegen=a%03d --mode=worker}
Approximate Pi: 3.14159265(442313229144133401860017329). Rel. Err. : 8.33339175e-10
The horrible thing with this example is that this code performs best serially. The serial version will compute with 10 million segments faster than mpi can startup. And as can be seen earlier, running with more segments will not yield more accurate results because of the inherent limitations of floating-point arithmetics.
In Part 2, we will see how python extension modules are built pyre-style using the config/build procedure.
And then in Part 3, we will look at yet another contrived pi example that can take advantage of parallelism.
