CSE6331: Working on SDSC Comet

The SDSC Comet Cluster

All programming assignments and projects will be done on the Comet cluster at SDSC (San Diego Supercomputer Center). The Comet Cluster has 1944 nodes and each node has 24 cores (built on two 12-core Intel Xeon E5 2.5 GHz processors), 128 GB memory, and 320GB SSD for local scratch space. More information can be found at the SDSC Comet site.

XSEDE and Comet Account Set-up

Logging into XSEDE portal and Comet system for the first time

You need to wait for the GTA to create a Comet account for all students in class. The GTA will send email when the accounts are ready.

After you get the email from the GTA, you do the following:
Login to XSEDE. The first time you login to XSEDE is complicated but if you set it up correctly, next times will be very easy:

The GTA will create accounts on Comet for all students registered on XSEDE. This will be done during the fourth week of the semester. The GTA will not be able to create a Comet account for you if you haven't created an XSEDE User Portal account. Your XSEDE username is you MavID (or whatever you used when you registered). Your Comet username though is based on your first and last name, so it is not your MavID. But your XSEDE and Comet passwords are the same (this is the password you specified when you registered). The first thing to do is to login on XSEDE. For example, from a Linux, a Mac PC, or a Windows 10, you may login using:
ssh xyz1234@login.xsede.org
where xyz1234 is your XSEDE username (your MavID). On older Windows, you must install a secure shell client, such as PuTTY. After you use your XSEDE password, it will ask for a passcode:
Duo two-factor login for xyz1234
Enter a passcode or select one of the following options:
 1. Duo Push to XXX-XXX-0000
 2. Phone call to XXX-XXX-0000
Passcode or option (1-2): 
Then you enter your passcode. How do you get a passcode: If you have installed and set up Duo on your smart phone, you use the app to get a new passcode. Otherwise, you go to the My XSEDE Profile->Manage Duo and choose Call Me to get the passcode. Then, after you login to XSEDE, you can login to Comet using:
gsissh comet
You can see your Comet username at the command prompt. This may not be your NetID. Now that you know your username, next time you may login on Comet directly:
ssh username@comet.sdsc.edu
where username is your Comet username.

Optional: you may set up Comet to login without a password. Once logged into Comet, store your public key from your laptop to Comet (your public key is in .ssh/id_rsa.pub on your laptop). (If you don't have a public key, you generate one using ssh-keygen with empty passphrase.) Then on Comet, you cut and paste your public key using a text editor, such as vi:

mkdir .ssh
vi .ssh/authorized_keys
and cut-and-paste the line from .ssh/id_rsa.pub on your laptop inside the file .ssh/authorized_keys on Comet.

How to Login to Comet

If you use a Linux or a MacOS laptop (or Windows 10), you can directly ssh or scp to comet from your laptop:

ssh username@comet.sdsc.edu
where username is your Comet user name. You can use scp to copy files between your laptop account and comet. For example, you can copy over a file or a whole subdirectory from your laptop to Comet (and just change the source and destination to copy from comet to your laptop):
scp myfile.java username@comet.sdsc.edu:
On older Windows, you can use PuTTY to login and FileZilla to transfer files.

About the Comet login nodes

The Comet login nodes run CentOS Linux.

The purpose of the login node is for submitting jobs, downloading data, editing programs, etc. You cannot compile Java programs on a login node. The compute nodes, on the other hand, are the computers that do the heavy duty work of running your programs. However, you do not directly interact with compute nodes. You ask for the scheduler to allocate compute nodes to run your application program using SLURM (the Simple Linux Utility for Resource Management), and then SLURM will find available compute nodes and run your application program on them. The files you see on the login node are shared among all nodes in the cluster. Use this login machine only for general tasks, such as editing source programs and submitting jobs to the cluster. You may use nano or emacs to edit your source programs. For compiling and running code use the SLURM sbatch command (to be explained), which submits a job to the cluster.

Usage Limits

The cse6331 class has been allocated a total of 40,000 SUs (1 SU = 1 core for 1 hour), which corresponds to 1,000 SUs per student. This roughly means that each student must use up to 100 SUs for each programming assignment and up to 400 SUs for the final project. You may see your total account usage using:

show_accounts username
where username is your Comet username. If a student exceeds the SU limit at the end of the semester, there will be 5% penalty on the student's final score.

Other SLURM commands:

squeue -u username     # print info about pending jobs of the user 'username'
scancel <jobid>        # cancel the job with this jobid
sinfo                  # view your job history

How to minimize your SU usage: Always test your programs in local mode first, and then, when you are absolutely sure that your program works correctly on small data in local mode, you run it in distributed mode. Scripts will be provided to build and run your programs in local and distributed modes.

More Documentation

The following site give more information about SLURM:

We thank XSEDE for awarding us an Education grant that gives students taking this course access to the SDSC Comet HPC cluster.

Last modified: 08/11/2019 by Leonidas Fegaras