Slurm Cluster

From ImageWiki

(Difference between revisions)
Jump to: navigation, search
(Basic Access and First Time Setup)
Line 1: Line 1:
-
=Slurm Cluster=
 
-
 
All Information on the page are subject to change! Especially hostnames are going to be replaced by nice alias names.
All Information on the page are subject to change! Especially hostnames are going to be replaced by nice alias names.

Revision as of 08:36, 2 May 2017

All Information on the page are subject to change! Especially hostnames are going to be replaced by nice alias names.

The cluster consists of two partitions:

  • image1 with 11 compute nodes with 2x20 Intel-Cores each
  • image2 with 1 compute nodes with 8x8 AMD cores each

Basic Access and First Time Setup

For accessing the cluster, you need access to ssh-diku-image.science.ku.dk. Ask one of the Admins to grant you access. The way to access the cluster is with <kuid> being your ku-username:

  ssh <kuid>@ssh-diku-image.science.ku.dk
  ssh a00552

The simplest way is to add an entry in .ssh/config

   Host cluster
   HostName a00552
   User <kuid>
   ProxyCommand ssh -q -W %h:%p <kuid>@ssh-diku-image.science.ku.dk

With this in place, you can directly login via

  ssh cluster

This will also come in handy if you want to copy your files via scp

  scp -r my_file1 my_file2 my_folder/ cluster:~/Dir

This will copy my_file1 my_file2 and my_folder/ into the path /home/kuid/Dir/. All files in your home directory are available to all compute nodes. You can also copy back simply by

  scp -r cluster:~/Dir ./

After your first login, you have to setup a private key which allows password free login to any of the other nodes. This is required for slurm to function properly! Simply execute the following. When asked for a password, leave blank:

   ssh-keygen
   ssh-copy-id a00553

Using Slurm

Slurm is a Batch processing manager which allows you to submit tasks and request a specific amount of resources which have to be reserved for the job. Resources are for example Memory, number of processing cores, GPUs or even a number of machines. Moreover, Slurm allows you to start arrays of jobs easily, for example to Benchmark an algorithm with different parameter settings. When a job is submitted, it is enqueued to the waiting queue and will stay there until the required resources are available. Slurm is therefore perfectly suited for executing long-running tasks.

To see how many jobs are queued type

   squeue

To submit a job use

  sbatch sbatchScript.sh

Where the sbatchscript.sh file is a normal bash or sh script that also contains information about the ressources to allocate. A quite minimal version looks like:

   #!/bin/bash
   #SBATCH --job-name=MyJob
   #number of independent tasks we are going to start in this script
   #SBATCH --ntasks=1
   #number of cpus we want to allocate for each program
   #SBATCH --cpus-per-task=4
   #We expect that our program should not run langer than 2 days
   #Note that a program will be killed once it exceeds this time! 
   #SBATCH --time=2-00:00:00
   #Skipping many options! see man sbatch
   # From here on, we can start our program
   
   ./my_program option1 option2 option3
   ./some_post_processing

Jobs are run on the node in the same path as the path you were when you submitted the job. This means that storing files relative to your current path will work flawlessly.

Personal tools