CIS 534 (Spring 2010): Sun Grid Engine (SGE) Cluster

To allow you to run your programs on otherwise idle machines, I've set up a Sun Grid Engine (SGE) installation for use by the class. SGE is an open-source job scheduler system developed by Sun Microsystems. Users submits jobs to SGE, and SGE will then queue all the jobs and schedule them to run on the machines. Once one job finishes, SGE schedules a new job on that machine. SGE also balances the resources in use by each user, preventing one user from monopolizing the compute resources.

This document explains how to use our SGE set. Other documentation can be found online at: http://gridengine.sunsource.net/

Quickstart

Environment Setup

To get started, first log into eniac.seas.upenn.edu using SSH:

ssh eniac.seas.upenn.edu

Next, if you are a csh/tcsh user, type the following command (or add it to your .cshrc):

source /project/cis/acg/sge/default/common/settings.csh

If you are a sh/ksh/bash user:

. /project/cis/acg/sge/default/common/settings.sh

This will set or expand several environment variables, such as $SGE_ROOT and adjusting your path for executable and manual pages.

Submitting a Test Job

To submit a test job, first change to a directory writeable by anyone in the cis534s unix group. Such a directory has already been created for you:

cd /project/cis/cis534-10a/users/$USER

Then, run the following command:

qsub -m as -M $USER -cwd -j y -q core2-quad.q -pe threaded 1 $SGE_ROOT/examples/jobs/sleeper.sh 60

This command submits the script "sleeper.sh" with a parameter 60 (which will cause it to sleep for 60 seconds). The -cwd tells qsub to put the output files in the current working directory. The "-j y" option tells it to redirect the stdout and stderr streams into a single output file. The "-q core2-quad.q" tells qsub to send the job only to the machines in that specific queue (all the Core 2 quad-cores, in this case) The "-pe threaded 1" option tells qsub to request one hardware thread. If you want to ensure you job is the only one on the system, set the number after "-pe threaded" to the number of hardware threads on the machine.

After running the above command, it should say something like:

Your job 17 ("Sleeper") has been submitted

You can then check the queue:

qstat -f

After a minute or so, you will have some output files in the current directory (one each for standard output and error), owned by the user acgsge:

-rw-r--r-- 1 acgsge cis534s  0 2006-11-01 09:41 Sleeper.e17
-rw-r--r-- 1 acgsge cis534s 95 2006-11-01 09:42 Sleeper.o17

If you want to test out submitting a bunch of jobs, just run qsub multiple times and then watch the jobs queue up and execute.

Submitting an Executable

To compile and run the code from the homework locally:

cd /project/cis/cis534-10a/users/$USER
cp ~cis534/html/homework/hw2/compute.C .
cp ~cis534/html/homework/hw2/compute.h .
cp ~cis534/html/homework/hw2/driver.C .
chmod g-rwX compute.C compute.h driver.C
g++ -Wall -O3 compute.C driver.C -lrt -o compute-x86
./compute-x86 --computation 1 --particles 20000 --trials 5

If this works, they try to run it on the grid:

qsub -m as -M $USER -cwd -b y -j y -q core2-quad.q -pe threaded 1 ./compute-x86 --computation 1 --particles 20000 --trials 5

The "-b y" flag tells qsub to accept a binary rather than a script.

To compile your code for the SPARC machine, you'll use the cross-compiler I setup for the class:

~cis534/public/cross/bin/sparc-sun-solaris2.10-g++ -R /home1/c/cis534/public/cross/sparc-sun-solaris2.10/lib/sparcv9/ -m64 -Wall -O3 compute.C driver.C -lrt -o compute-sparc

The "-R" flag tells the linker where to look for libraries. You can verify it worked by running the "file" command:

file compute-sparc

Which should return "ELF 64-bit MSB executable, SPARC V9...". To then run it on arachnid via SGE:

qsub -m as -M $USER -cwd -b y -j y -q sparc.q -pe threaded 1 ./compute-sparc --computation 1 --particles 20000 --trials 5

Some Common Command-Line Commands

To see the jobs in the queue by all users:

qstat -f -u "*"

To see the machines in the cluster:

qhost

To see which machines are running which jobs:

qhost -j

To see details of each machine:

qhost -F

To see who has been using the cluster:

qacct -o

To delete all your jobs:

qdel -u $USER

For more information on these commands, type "man qstat" or whatever.

Graphical Tool

To use the x-windows based GUI monitoring and configuration tool:

qmon &

Next Steps

Next, read the rest of this document for important information on how to use the cluster.

Available Machines

There are several different "queues" available, each configured to contain a homogeneous set of machines:

Dual-Socket Dual-Core Core2: acgrid01 - acggrid32

The core2-quad.q queue has the most machines (right now just a few, but we have 32 of these dual-socket dual-core machines). This is the default queue. Specify it as:

qsub -q core2-quad.q -pe threaded 4 ...

Dual-Socket Quad-Core Core2: acgrid34.seas.upenn.edu

This will schedule the job on our only dual-socket quad-core machine (8 cores total):

qsub -q core2-oct.q -pe threaded 8 ...

Single-Socket Quad-Core Corei7: acgrid35.seas.upenn.edu

This will schedule the job on our only single-socket, quad-core, two-thread-per-core Core i7 system (8 hardware threads total). This machine was generously donated by Intel:

qsub -q corei7.q -pe threaded 8 ...

Dual-Socket Sun Niagara T2: arachnid.seas.upenn.edu

This will schedule the job on our only dual-socket Niagara T2 machine (128 hardware threads total). This machine was generously donated by Sun Microsystems:

qsub -q sparc.q -pe threaded 128 ...

Time Limits

Each of these queues is configured to allow each job to run for 5 minutes at most. After five minuets of runtime, SGE will kill off the job. I've instituted this policy to ensure that nobody can accidentally monopolize the cluster resources because of a runaway job.

File Storage and the 'cis534s' Unix Group

Before talking about SGE, some information about files and file permissions.

The way we have set up SGE, SGE runs all jobs as the user "acgsge". Running it as a non-root user has some advantages (security and not needing root to configure it), but it causes few headaches. The two main issues are: (1) file permissions and (2) file space quota issues.

CIS534 File Space and File Permissions

The way we have set up SGE, SGE runs all jobs as the user "acgsge" which is a member of the "cis534s" unix group (which you should also all be members of). This means that all directories to want your job to read or write must have the correct permissions. This is probably the number one source of problems you'll encounter when first using our SGE setup.

To help manage file permissions (and also to deal with the quote issue described below), we have a dedicated project file space just for CIS534. I have set up a directory for your files in the shared file space:

/project/cis/cis534-10a/users/$USER

To keep files and directories read/write-able by the "cis534s" group, you'll want to do a few things.

Set the "set group ID" on directories. You can do this by:
```
chgrp cis534s directoryname 
chmod -R g+rwXs directoryname 
```
This will make sure that all files (and subdirectories) created in this sub-directory inherent the group. Note: the user directory you start out with should already have this set, so you shouldn't need to change anything.
Use "cp" instead of "mv". If you "cp" to copy files, it will inherent the permissions correctly. If you use a "mv" it won't. Thus, avoid using "mv" in favor of "cp".
Set your "umask". I set mine using "umask 7". This will make all files you create read/write-able by you and the group (but not other). Works for me. You can set this in my .cshrc or bash equivalent.
Finally, you might want to make setting the file permissions as part of your SGE submit script. This is what I did when I was in graduate school, and it prevented lots of mistakes on my part.

Acgsge User Quota Problems

As the scheduling software runs at the acgsge user, all files it creates are owned by that user. If your progams write files into any CETS home directory, those files will count towards the quota of the acgsge user. This can cause quota problems. To avoid this problem, always write files to shared space we have set up for the class (see below). Quota for such space is handled differently, and thus the problem is avoided.

Going Further

The directory $SGE_ROOT/examples/jobs/ has several example submission scripts. Looking at the various man pages and -help flags are useful. There are also lots of pages on the web you might find helpful.

Some other helpful links:

http://gridengine.sunsource.net/howto/howto.html

http://gridengine.sunsource.net/howto/commonproblems.html

http://gridengine.info/

http://issg.cs.duke.edu/lab/gridware.html

http://www.rzg.mpg.de/docs/linux/sge.html

Troubleshooting

Some tips: http://gridengine.sunsource.net/howto/troubleshooting.html
Problem: a user is able to submit jobs, but they stay in the "pending" state for indefinite amount of time.

The actual error you will get when you submit your job is like this:
```
Unable to run job: warning: your_username your job is not allowed to run in any queue
Your job your_jobid ("your_jobname") has been submitted.
Exiting.
```
(for each job you submit)

Solution: the submitting user may need to be added to the group of acg users using the "qmon" tool.
Problem: a job on the queue with status Eqw, which means that the job's location directory was not given the correct group permissions. Solution: Just fixing the permissions will not solve the problem. You must kill the jobs, fix the permissions (chgrp acg your_dir; chmod g+rwx your_dir), then start them again. This time they should work. See the section on group permissions for how to avoid this.

Problem: jobs don't start but instead spit out a seemingly infinite error stream that says: tset: standard error: Inappropriate ioctl for device. Solution: check your .login file for terminal setting problems. For example, the following .login code:

loop:
  ## If modem dialup or vt100, use vt100
  if ($TERM == network || $TERM =~ *[vV][tT]*100) eval `tset -QIs vt100`

  ## If don't know, ask (default to vt100).  Otherwise, use it.
  if ($TERM == '' || $TERM == unknown) then     # don't know?
    eval `tset -QIs \?vt100`            # then ask (default vt100)
    else eval `tset -QIs $TERM`         # know?  use it then
  endif

if ($TERM == unknown || $TERM == '') goto loop

Could cause problems.

Solution: The solution is to wrap this code:

 if ( { [ -t ] } ) then
  (do the interactive-only stuff here)
endif

SGE does not take a snapshot of the executable you specify when submitting jobs. This means that if half your jobs start running and the other half are queued up and you then change your executable, the jobs that have not yet started will execute the updated executable, not the original one.