CSE6331: Working on SDSC Comet

The SDSC Comet Cluster

All programming assignments and projects will be done on the Comet cluster at SDSC (San Diego Supercomputer Center). The Comet Cluster has 1944 nodes and each node has 24 cores (built on two 12-core Intel Xeon E5 2.5 GHz processors), 128 GB memory, and 320GB SSD for local scratch space. More information can be found at the SDSC Comet site.

To use Comet, you need to create a free XSEDE User Portal account. You should use your MavID (eg, xyz1234) as your XSEDE username. Go to the XSEDE portal site and select Create Account. Initially, on the first page, you choose a registration key, which can be any sequence of 6 letters, and then, after you receive a confirmation email, you specify your XSEDE user name (which must be your MavID) and a password. You should register in the first two weeks of class.

How to Login to Comet

The GTA will create accounts on Comet for all students registered on XSEDE. This will be done during the third week of the semester. The GTA will not be able to create a Comet account for you if you haven't created an XSEDE User Portal account.

Your XSEDE username is you MavID (or whatever you used when you registered). Your Comet username though is based on your first and last name, so it is not your MavID. But your XSEDE and Comet passwords are the same (this is the password you specified when you registered. The first thing to do is to login on XSEDE. For example, from a Linux or a Mac PC, you may login using:

ssh xyz1234@login.xsede.org
where xyz1234 is your XSEDE username (your MavID). On Windows, you must install a secure shell client (see User Guides - Unix). Then, from XSEDE, you can login to Comet:
gsissh comet
You can see your username at the command prompt. Now that you know your username, next time you may login on Comet directly:
ssh username@comet.sdsc.xsede.org
where username is your Comet username. The Comet login nodes run CentOS Linux.

The purpose of the login node is for submitting jobs, downloading data, editing programs, etc. You cannot compile Java programs on a login node. The compute nodes, on the other hand, are the computers that do the heavy duty work of running your programs. However, you do not directly interact with compute nodes. You ask for the scheduler to allocate compute nodes to run your application program using SLURM (the Simple Linux Utility for Resource Management), and then SLURM will find available compute nodes and run your application program on them. The files you see on the login node are shared among all nodes in the cluster. Use this login machine only for general tasks, such as editing source programs and submitting jobs to the cluster. You may use nano or emacs to edit your source programs. For compiling and running code use the SLURM sbatch command (to be explained), which submits a job to the cluster.

Usage Limits

The cse6331 class has been allocated a total of 40,000 SUs (1 SU = 1 core for 1 hour), which corresponds to 1,000 SUs per student. This roughly means that each student must use up to 100 SUs for each programming assignment and up to 400 SUs for the final project. You may see your total account usage using:

show_accounts username
where username is your Comet username. If a student exceeds the SU limit at the end of the semester, there will be 5% penalty on the student's final score.

Other SLURM commands:

squeue -u username     # print info about pending jobs of the user 'username'
scancel <jobid>        # cancel the job with this jobid
sinfo                  # view your job history

How to minimize your SU usage: Always test your programs in local mode first, and then, when you are absolutely sure that your program works correctly on small data in local mode, you run it in distributed mode. Scripts will be provided to build and run your programs in local and distributed modes.

More Documentation

The following site give more information about SLURM:

We thank XSEDE for awarding us an Education grant that gives students taking this course access to the SDSC Comet HPC cluster.

Last modified: 08/30/2017 by Leonidas Fegaras