This chapter describes how to move the heart of the portal backend – the daemon – from one machine to another.
The daemon must be run on a host which is also running a web server. This is so that users can download simulation output files via HTTP. The daemon uses scp to copy the output files from the TeraGrid cluster; the daemon is configured to place them under ~/public_html, where they can be served by Apache.
The daemon is flexible enough to run on a machine distinct from the one which hosts the main Django app. However, both the daemon and the Django app are typically run on the same server.
The daemon is usually run under its own Unix account (portal) but this is not a requirement.
Before you begin, shut down the daemon on the old server, so that the portal’s state does not change during the move:
$ ps aux | grep stackless
portal 15467 0.0 0.0 71704 15712 ? S 14:36 0:02 stackless daemon.py
portal 17587 0.0 0.0 5168 856 pts/0 S+ 17:46 0:00 grep stackless
$ kill -INT 15467
$
Before you do this, make sure there are no active runs. One way to do this is to check the hidden list.py page in your web browser (i.e., geodynamics.org/portals/seismo/<secret>/runs/list.py). If the last few entries have a status of either done or error, then the portal is probably idle.
The output files for the seismo portal consume many gigabytes, so it is best to start copying them right away:
$ scp -r ~portal/public_html/output portal@www.geodynamics.org:/home/portal/public_html
Note: The Django app has support for purging the output files of a run. However, it is not as simple as merely removing the files on disk. The corresponding ArchivedRun records in the database must also be removed so that the Django app knows that the run as been purged. A script to purge old runs could be written with little effort.
The daemon.py script has the following dependencies:
So the first step is to install Stackless on the destination server:
$ wget http://www.stackless.com/binaries/stackless-264-export.tar.bz2
$ tar xjf stackless-264-export.tar.bz2
$ cd stackless-2.6.4
$ ./configure --prefix=$HOME/opt/stackless
$ make && make install
Next, create a symlink on your PATH so that you can easily run Stackless Python:
$ cd ~/bin
$ ln -s $HOME/opt/stackless/bin/python stackless
Note how we name the symlink stackless, so as not to conflict with the system python.
Install CIG Pythia under the same directory where you installed Stackless:
$ wget http://pypi.python.org/packages/source/p/pythia/pythia-0.8.1.10.tar.gz
$ tar xzf pythia-0.8.1.10.tar.gz
$ cd pythia-0.8.1.10
$ stackless setup.py install --prefix=$HOME/opt/stackless
Note how we used stackless to build and install Pythia. This is especially important if the version of Stackless does not match the system python.
The daemon uses ssh and scp to connect to the remote TeraGrid cluster. This is acheived though a “passphrase-less” SSH key installed on the TeraGrid cluster.
For security reasons, the SSH public key listed in ~/.ssh/authorized_keys on the TeraGrid is restricted using from="geodynamics.org". You may have to update this so that the daemon can connect from its new host.
Most importantly, you must copy the corresponding public/private key pair, ~/.ssh/daemon_id_rsa.pub and ~/.ssh/daemon_id_rsa, from the daemon’s old host to the new one.
You are now ready to create a working directory for the daemon – home of the daemon.py script, its configuration file, and its log file. You may copy daemon.py from the old host, or simply grab it from the Subversion server:
$ mkdir ~/daemon
$ cd ~/daemon
$ svn cat http://geodynamics.org/svn/cig/cs/portal/trunk/northridge/backend/daemon.py > daemon.py
Next, copy web-portal-daemon.cfg from the old host to the ~/daemon directory you just created on the new host. You need change only a handful of lines in this file (if any): those corresponding to the output root, and those lines which mention the SSH private key.
First, there is the output root:
[web-portal-daemon]
output-root-pathname = /home/portal/public_html/output/
output-root-url = http://crust.geodynamics.org/~portal/output/
#...
The daemon sends output-root-url to the Django app (using HTTP POST) to identify the base URL from which output files may be downloaded. This needs to changed to reflect the daemon’s new home (assuming the hostname or path is different):
output-root-url = http://www.geodynamics.org/~portal/output/
If you placed the output files in a different location, you will need to change output-root-pathname as well.
Second, there are three lines which mention the passphrase-less SSH private key used for connecting to the TeraGrid cluster. If the path to this file is different on the new host, update the ssh and scp configuration lines accordingly.
Congratulations! You are now ready to start the daemon:
$ cd ~/daemon
$ nohup stackless daemon.py &
Important: The daemon is not smart enough to disconnect from the terminal or go into the background by itself. So, both the nohup and the ampersand are critical to ensure that the daemon lives on after you logout.
Two log files should have been created in the working directory: nohup.out and journal.log. Use the following command to monitor the daemon’s activity:
$ tail -f journal.log
It is good idea to arrange for the daemon to start automatically at boot. On Linux – or any system with Vixie Cron – this is easily acheived using crontab:
$ cat >crontab <<EOF
> PATH=$PATH
> PYTHONPATH=$PYTHONPATH
>
> @reboot cd /home/portal/daemon && nohup stackless daemon.py &
>
> EOF
$ crontab crontab
$
Important: The daemon must be started from its special working directory. It uses the current working directory to find the web-portal-daemon.cfg configuration file.
Remember how you ran scp to copy the output files from the old host to the new one? Now is a good time to check on its progress.
Once the copy is complete, there is one final task to perform.
The old job records in the database still point to the web server on the daemon’s old host. If the URL for the new host is different (and you therefore changed the output-root-url setting as described under Configuration), you need to update the records in the database to point to the new server. This is achieved by running a simple Python script on the machine hosting the Django app:
#!/usr/bin/env python
import SeismoWebPortal.models as m
for j in m.Job.objects.all():
j.url = j.url.replace("crust", "www")
j.save()
Obviously, you should customize this code as appropriate. Since this script accesses the database using Django, your enviroment must be configured to use Django, and you need to have write permission to the database. (Depending on your setup, the latter may involve becoming www-data using su.)
$ cd ~/www-data
$ . www-setup.sh
$ ./update-job-url.py
If you don’t have a copy of www-setup.sh, use debug-setup.sh as a template.
To verify that the new setup works: