ACG's Sun Grid Engine (SGE) Cluster

Our new 128-processor cluster consists of 32 machines each with two sockets and two cores per socket. The processors are 64-bit 2.66 Ghz Core 2 Duo Xeons each with four MBs of second-level cache..

The machines are named: acggrid01.seas.upenn.edu through acggrid32.seas.upenn.edu. We can log into any of the machines directly, if need (they just look like standard CETS-supported Linux boxes).

The cluster uses the open-source Sun Grid Engine (SGE) system for cluster job scheduling. Codex-l.cis.upenn.edu is currently running the SGE job scheduler.

Contents

File Storage and the 'acg' Unix Group

Before talking about SGE, some information about files and file permissions.

Each of you should have a directory for your files in the shared /mnt/eclipse/acg file space. For example, my directory is /mnt/eclipse/acg/users/milom/. All research-related files should be kept in /mnt/eclipse/acg. All private files should be kept in your account's home directory.

On the ACG grid, SGE runs all jobs as the user "acgsge" which is a member of the "acg" unix group. This means that all directories to want your job to read or write must have the correct permissions. This is probably the number one source of problems you'll encounter when first using our SGE setup.

To keep files and directories read/write-able by the acg group, you'll want to do a few things.

Acgsge user quota problems: As the scheduling software runs at the acgsge user, all files it creates are owned by that user. If your progams write files into any CETS home directory, those files will count towards the quota of the acgsge user. This can cause quota problems. To avoid this problem, always write files to shared space such as /mnt/eclipse/acg/. Quota for such space is handled differently, and thus the problem is avoided.

Quickstart

Environment Setup

To get started, first log into codex-l.cis.upenn.edu using SSH:

ssh codex-l.cis.upenn.edu

Next, if you are a csh/tcsh user, type the following command (or add it to your .cshrc):

source /home1/a/acgsge/sge/default/common/settings.csh

If you are a sh/ksh user:

# . /home1/a/acgsge/sge/default/common/settings.sh

This will set or expand the following environment variables:

- $SGE_ROOT         (always necessary)
- $SGE_CELL         (if you are using a cell other than >default<)
- $SGE_QMASTER_PORT (if you haven't added the service >sge_qmaster<)
- $SGE_EXECD_PORT   (if you haven't added the service >sge_execd<)
- $PATH/$path       (to find the Grid Engine binaries)
- $MANPATH          (to access the manual pages)

Some Common Command-Line Commands

To see the jobs in the queue:

qstat -f

To see the machines in the cluster:

qhost

To see which machines are running which jobs:

qhost -j

To see who has been using the cluster:

qacct -o

For more information an various accounting, see the man page for "accounting"

Graphical Tool

To use the x-windows based GUI monitoring and configuration tool:

qmon &

Submitting a Test Job

To submit a test job, first change to a directory writeable by anyone in the acg unix group. Then, run the following command:

qsub -cwd ~acgsge/sge/examples/jobs/sleeper.sh

It should say something like:

Your job 17 ("Sleeper") has been submitted

You can then check the queue:

qstat -f

After a minute or so, you will have some output files in the current directory (one each for standard output and error), owned by the user acgsge:

-rw-r--r-- 1 acgsge acg  0 2006-11-01 09:41 Sleeper.e17
-rw-r--r-- 1 acgsge acg 95 2006-11-01 09:42 Sleeper.o17

If you want to test out submitting a bunch of jobs, just run qsub multiple times and then watch the jobs queue up and execute.

Going Further

The directory ~acgsge/sge/examples/jobs/ has several example submission scripts. Looking at the various man pages and -help flags are useful. There are also lots of pages on the web you might find helpful.

Additional Information

Requesting Resources

It seems you can also tell SGE that you want a specific set of resources before you start your job. To see the requestable resources:

qconf -sc
qstat -t -F

It is also helpful to look at some various man pages:

man complex
man host_conf

Some other related links:

Checkpointing

Sun Grid engine doesn't directly support checkpointing, but it does have hooks to let you use a automated checkpointing library or application-level checkpointing. One reasonable option is Condor's checkpointing library. Setting this up requires mucking with the SGE config, but doesn't look impossible.

Queue Scheduling Notes

GridEngine schedule shares and such ("share tree" == average over time; "functional" is an instantaneous priority). At least one web page advocates the functional schedule, as it is easiest to explain (if many users have jobs pending, the cluster will be divided up proportionally to each user). They said that in their experience, "share tree" sounds good in theory, but the practice the history it includes make it harder for end users to reason about:

Other docs:

Grid Restart

To restart the daemons, log in as the user "acgsge". To start qmaster and scheduler, on codex-l type:

/home1/a/acgsge/sge/default/common/sgemaster

On each of the rest of the cluster nodes launch execd with the following command:

/home1/a/acgsge/sge/default/common/sgeexecd

After you run this, it should list the "execd" process running as user acgsge.

Configuring the Grid

To sets the number slots on a machine (for example, acggrid20) for queue all.q to zero:

qconf -rattr queue slots 0 all.q@acggrid20

Troubleshooting