Emerald

Emerald is a GPU cluster that provides an alternative capability to traditional HPC systems. In strategic partnership with the consortium, Emerald is hosted and operated out of the Science and Technology Facilities Council’s Rutherford Appleton Laboratory. Using the Nvidia K80s, Emerald has achieved ~26.5 TFlops on a single job. The Nvidia M2090s have been benchmarked at ~67 TFlops.

Emerald’s Capabilities

  • 8 DELL R720 compute nodes with two 8-core E5-2623v3 Intel Haswells and two K80 NVIDIA GPUs and 128GB memory.
  • 56 HP SL390 compute nodes with two 6-core X5650 Intel Xeons and three 512-core M2090 NVIDIA GPUs and 48GB memory
  • 24 HP SL390 compute nodes with two 6-core X5650 Intel Xeons and eight 512-core M2090 NVIDIA GPUs and 96GB memory
  • 4 HP compute nodes with two 6-core E5-2640 Intel SandyBridge Xeons and three K20s NVIDIA GPUs and 48GB memory
  • 3 front end nodes for interactive login and remote visualisation
  • QDR Infiniband with Mellanox/Voltaire switches and a fat-tree topology
  • 10Gb Ethernet with Gnodal switches in a grid topology
  • Panasas parallel storage system with approximately 135TB usable disk space

For Users

This guide assumes a working familiarity with Linux and batch systems. If you have any comments or questions please e-mail Emerald-support@ses.ac.uk

Logging on

Your username will be sent to you via e-mail. The preferred hostname to use to login is emerald.einfrastructuresouth.ac.uk. This is a DNS round robin for the 3 UI hosts. If you have intermittent connectivity problems try one of the IP addresses directly (and let us know).

Hosts

Cluster nodes are named either cn3gXX.gpu.rl.ac.uk, where XX is 01-60 or cn8gYY.gpu.rl.ac.uk, where YY is 01-24 depending on whether they have 3 GPUs or 8 GPUs. Each system has 12 cores (2×6). 3 GPU systems have 48GB of RAM, 8 GPU systems have 96 GB of RAM.

Network

All hosts are connected to a pair of Mellanox SX1024 10 GigE switches, with 11 40GigE connections between the two switches, there is an additional 40 GigE connection to the rest of STFC’s network

All hosts are also connected using a fast Infinband network to a series of Voltaire Infiniband switches to provide a low latency internode communication.

Monitoring

Ganglia is available at https://www.emerald.rl.ac.uk/ganglia
The username and password are ganglia.

Submitting Jobs

Emerald uses Platform LSF 8 and Platform MPI for job submission and MPI.

To submit a very simple job, do

bsub 'echo "Hello world" > ~/out1'

Other useful commands are bjobs to get the state of submitted jobs. bkill to terminate jobs and bpeek to see the output and error streams from a running job. Man pages are available for these commands. Some useful options for bsub are noted below:

Parameter Description
-q queuename Emerald has 2 queues, the default emerald has a default runtime of 2 days, there is another queue emerald-short which can be used for short jobs – up to 12 hours. emerald-short is intended for short jobs, such as interactive work and development, so each user is only allowed to run a small number of jobs on this queue at once
-x Host exclusive job – no other job will be scheduled to run on the hosts this job is using. Note that this does not necessarily guarantee that the job will get all the slots on the hosts – some may be reserved for other jobs which are gathering the slots required for them to run.
-n <number> Number of slots to use for job, where a slot is defined as 1 GPU and 1 CPU core. Therefore the number of slots requested should be the larger of the required GPUs and required CPU cores. Note that each host is configured with a number of slots equal to the number of GPUs.
-o <filename> Direct job stdout to a file, note that %J can be used as placeholder for the jobid
-e <filename> As above but for stderr
-R “span[ptile=<number>]” Allocate <number> slots per host – the 3 GPU hosts have a maximum of 3 slots and the 8 GPU hosts have 8 slots
-R “span[hosts=1]” Use only 1 host for the job, note that 1 is the only valid parameter
-W <hh:mm> Predicted job length, the emerald queueDefaults to 48 hours and has a maximum length of 168 hours
-J <name> Give the job a name to show up in bjobs
-m <text> Restrict the job to use slots on a given list of hosts for job execution. Eg:

-m "cn3g01.gpu.rl.ac.uk cn3g02.gpu.rl.ac.uk".

Two host groups have been defined for convenience – emerald3g and emerald8g which are the nodes with 3 GPUs and the nodes with 8 GPUs (to use them specify -m emerald3g).

If this option is not specified, then the default is to allocate hosts for the job from all Emerald hosts.

-Is Run an interactive job, useful for debugging or compilation.Once submitted the job will not exit, and when it begins running a command -line on a cluster node will be returned.When using –Is the last argument to bsub should be the shell you wish to run, e.g. :

bsub -Is -m emerald3g /bin/bash

Bsub can also accept a job script piped into in it e.g.

bsub < jobscript

Any parameter that can be given on the command line to bsub can also be put in the job script but the line must begin #BSUB, e.g :

#BSUB -o /home/stfc/eissc001/%J.log
#BSUB -e /home/stfc/eissc001/%J.err
#BSUB -W 1:00
mpirun -lsf -prot ~home/stfc/eissc001/a/rhel5.out

This job script runs an MPI application; for further information on this please see the MPI page.

Using LSF with other MPI distributions

Platfrom MPI handles the -lsf parameter and takes the list of hosts from LSF directly, however OpenMPI and MVAPICH2 have no such option. One solution to this is to use an environment variable set by LSF: LSB_DJOB_HOSTFILE, which points to a file that lists all the hosts assigned to this job (each host as many times as many cores are allocated on that host). Thus, you can call mpirun with a hostfile:

mpirun --hostfile $LSB_DJOB_HOSTFILE

Some system commands

  • bjobs – display list of jubs submitted by the user along with details such as the nodes it is executing on, job name etc.
  • qstat – status of submitted jobs
  • bkill #jobid – kill a specific job

Parallel File System

File Store

The Panasas parallel storage is provided by Panasas ActiverStor 11 shelves. Each shelf has 1 Director Blade and 10 Storage Blades, with each Storage Blade having a raw capacity of 6TB with total usable capacity of 235TB. Each shelf is connected at 10 Gigabit. Each host on Emerald mounts the Panasas storage directly using its proprietary protocol.

Each user has access to 3 classes of storage, described in the table below

Emerald Storage Areas Description
Home Directory Private to each users and located at /home/<institute>/<username>, with a 100GB quota. A daily snapshot is stored internally by Panasas and the storage is backed up to tape weekly.
Work Directory An area shared between members of the same institute and located at /work/<institute>. There is no quota other than the allocated size of the institute’s volume. This area is not backed up but Emerald administrators will not delete data from it without attempting to contact the user first. It is intended for data required for multiple jobs, but that can be recoverered from elsewhere if necessary
Scratch Directory Shared between all users of Emerald, and located at /work/scratch this area is intended for temporary files used by jobs. Emerald administrators reserve the right to delete files older than 48 hours from this area.

Quotas on home directories can be increased if required, or further volumes created if required. Please contact us to discuss your requirements

Viewing quotas and usage

Home directories quotas and usage can be viewed by running the command

$ pan_quota

while in your home directory, this will print out information like that below

  <bytes>    <soft>    <hard> : <files>    <soft>    <hard> : <path to volume> <pan_identity(name)>
   294912 unlimited unlimited :       5 unlimited unlimited : /home/stfc uid:0(root)

Usage of areas under /work can be viewed by running a command like

$ pan_df /work/stfc

Replace stfc with the name of area you are interested in, choose from bristol, oxford, soton, stfc or ucl. The standard df cannot be used as the /work areas are not mounted directly, the pan_df command gives output similar to

Filesystem           1K-blocks      Used Available Use% Mounted on
panfs://130.246.139.120/gpu/work/
                     3906250000 251374656 3654875344   7% /work/stfc

Recovering files from snapshots

Snapshots are taken at 4am every day and are accessible by cd-ing into the hidden directory .snapshot from any directory in the home filesystem. Running ls in  this directory will give output similar to the following :

$ ls
2012.07.27.04.05.01.gpuhome  2012.07.29.04.05.01.gpuhome  2012.07.31.04.05.01.gpuhome  2012.08.03.04.05.01.gpuhome
2012.07.28.04.05.01.gpuhome  2012.07.30.04.05.01.gpuhome  2012.08.02.04.05.01.gpuhome

Each of these directories is the state of the directory tress at the time the snapshot was taken. Files deleted or inadvertently overwritten can be simply copied out of the snapshot directory back into place.

LSF Batch file system

Submitting jobs

Emerald uses Platform LSF 8 and Platform MPI for job submission and MPI.

To submit a very simple job, do

bsub 'echo "Hello world" > ~/out1'

Other useful commands are bjobs to get the state of submitted jobs. bkill to terminate jobs and bpeek to see the output and error streams from a running job. Man pages are available for these commands. Some useful options for bsub are noted below:

Parameter Description
-x Host exclusive job – no other job will be scheduled to run on the hosts this job is using. Note that this does not necessarily guarantee that the job will get all the slots on the hosts – some may be reserved for other jobs which are gathering the slots required for them to run.
-n <number> Number of slots to use for job, where a slot is defined as 1 GPU and 1 CPU core. Therefore the number of slots requested should be the larger of the required GPUs and required CPU cores. Note that each host is configured with a number of slots equal to the number of GPUs.
-o <filename> Direct job stdout to a file, note that %J can be used as placeholder for the jobid
-e <filename> As above but for stderr
-R “span[ptile=<number>]” Allocate <number> slots per host – the 3 GPU hosts have a maximum of 3 slots and the 8 GPU hosts have 8 slots
-R “span[hosts=1]” Use only 1 host for the job, note that 1 is the only valid parameter
-W <hh:mm> Predicted job length, the emerald queueDefaults to 48 hours and has a maximum length of 168 hours
-J <name> Give the job a name to show up in bjobs
-m <text> Restrict the job to use slots on a given list of hosts for job execution. Eg:

-m "cn3g01.gpu.rl.ac.uk cn3g02.gpu.rl.ac.uk".

Two host groups have been defined for convenience  – emerald3g and emerald8g which are the nodes with 3 GPUs and the nodes with 8 GPUs (to use them specify -m emerald3g).

If this option is not specified, then the default is to allocate hosts for the job from all Emerald hosts.

-Is Run an interactive job, useful for debugging or compilation.Once submitted the job will not exit, and when it begins running a command -line on a cluster node will be returned.When using –Is the last argument to bsub should be the shell you wish to run, e.g. :

bsub -Is -m emerald3g /bin/bash

Bsub can also accept a job script piped into in it e.g.

bsub < jobscript

Any parameter that can be given on the command line to bsub can also be put in the job script but the line must begin #BSUB, e.g :

#BSUB -o /home/stfc/eissc001/%J.log
#BSUB -e /home/stfc/eissc001/%J.err
#BSUB -W 1:00
mpirun -lsf -prot ~home/stfc/eissc001/a/rhel5.out

This job script runs an MPI application; for further information on this please see the MPI page.

Using LSF with other MPI distributions

Platfrom MPI handles the -lsf parameter and takes the list of hosts from LSF directly, however OpenMPI and MVAPICH2 have no such option. One solution to this is to use an environment variable set by LSF: LSB_DJOB_HOSTFILE, which points to a file that lists all the hosts assigned to this job (each host as many times as many cores are allocated on that host). Thus, you can call mpirun with a hostfile:

mpirun --hostfile $LSB_DJOB_HOSTFILE

Some system commands

  • bjobs – display list of jubs submitted by the user along with details such as the nodes it is executing on, job name etc.
  • qstat – status of submitted jobs
  • bkill #jobid – kill a specific job

OpenCL documentation

  • The SHOC benchmark suite, an OpenCL/Cuda/OpenMP/MPI benchmark from Oak Ridge National Laboratory
  • ViennaCL open source library of linear algebra operations (BLAS levels 1, 2 and 3) and the solution of large systems of equations by means of iterative methods with optional preconditioner

External Access Scheme

Overview

The purpose of this scheme is to allow organisations outside of the Consortium access to the EMERALD system in order to conduct exploratory activities such as architecture evaluation, benchmarking, scaling studies and short term production work (e.g. in support of grant proposals, conference results or paper publication). Projects supported under the scheme have been up to six months in duration reflecting the focus of this scheme.

Process

In order to gain access to the system we require that you complete a Technical Assessment form to inform us of the proposed work, who will require access, what codes will be required and the level of compute resource required. This information will be used to assess the technical feasibility of what is being proposed and also to address any assumptions made in the application.

Applications will be based on the technical rather than scientific grounds and there will be no scientific peer review of any proposal submitted. A copy of the technical assessment form can be obtained from the Registration page of this site.

Standard charges apply to the use of SES services.  For the avoidance of doubt, no financial assistance is available from the centre in support of applications (e.g. to buy software licences, consumables, T&S etc).

The completed form should be submitted via support@ses.ac.uk and will be collated for review by the Operations Group who will aim to turnaround the proposal within one month from time of receipt; there are no deadlines for receipt of applications for access.

When access to the system has been agreed successful applicants will need to agree to the standard conditions of access to EMERALD as well as the requirement to complete an end of project final report. These reports will be used by the consortium to collect information on usage of the machine across disciplines, research outputs from projects, next steps and general feedback on the service.

Please note that access to the EMERALD system is at the discretion of the Centre for Innovation and users will be required to sign up to the EMERALD conditions of access and comply with reasonable reporting requests made by the consortium from time to time.

Contacts

If you have any questions regarding the external access scheme then please e-mail support@ses.ac.uk.

Research Groups

The following are some of the main research groups using Emerald:

Oxford:

Bristol:

Southampton:

UCL:

STFC:

In addition to users developing their own GPU applications using one of the following:

there will also be users using one of the following 3rd party molecular dynamics application codes:

  • NAMD (contact: Phil Biggin / Maria Musgaard)
  • AMBER (contact: Adrian Mulholland / Marc van der Kamp)
  • GROMACS (contact: Mark Sansom / Sarah Rouse)
  • LAMMPS (contact: Jon Essex / Sophia Wheeler)

There is a range of standard linux software development utilities installed.

  • SVN – to get SVN working you need to edit ~/.subversion/servers and set the http-proxy-host = wwwcache.rl.ac.uk and http-proxy-port = 8080
  • Git – Git should work out of the box, however if you are accessing e.g. github you will have to add a public key to your account. For general troubleshooting, see https://help.github.com/articles/error-permission-denied-publickey

Most software below are either free software or covered by STFC’s license for use on Emerald. We will not remove any software from Emerald without notifying all users. If you have issues with any of the software below or software installed in standard system locations such as /usr/bin/ or /bin should be reported to the helpdesk where Emerald admin staff will assist. Other software such as contrib modules or software installed in user’s home directories or work areas cannot be supported by Emerald admin staff.

Software Description Available versions
Amber Amber is a package of programs for molecular dynamics simulations of proteins and nucleic acids.How to use GPUs in Amber. amber/12(default)
amber/12.9
AMD OpenCL support amdopencl/2.7
Boost C++ library The default boost library is 1.41, should you need a later version this module can be used boost/1.53
CUDA SDK NVidia’s CUDA toolkit, documentation and compilers cuda/4.0.17
cuda/4.1.28
cuda/4.2.9(default)
cuda/5.0.24rc
cuda/5.0.35
cuda-sim/0.7
DL_POLY DL_POLY is a general purpose classical molecular dynamics (MD) simulation. Tips for using GPUs. dl_poly/4.03.4
GMSH GMSH is a three-dimensional Finite Element mesh generator gmsh/2.7
GROMACS GROMACS is a package to perform molecular dynamics, i.e. simulate the Newtonian equations of motion for systems with hundreds to millions of particles. Using GPUs
IDL

IDL is a scientific programming language used across disciplines to create visualizations from complex numerical data

idl/8.1
Intel Compiler Suite intel/11.1
intel/12.0
intel/12.1(default)intel/cce/11.1.073
intel/cce/12.0.0.084
intel/cce/12.1.0.233
intel/cce/12.1.11.339intel/fce/11.1.073
intel/fce/12.0.0.084
intel/fce/12.1.0.233
intel/fce/12.1.11.339
intel/idb/12.0intel/mkl/10.2.5.035
intel/mkl/10.3.0.084
intel/mkl/10.3.6.233
LAMMPS LAMMPS is a classical molecular dynamics code for soft and solid-state materials and coarse-grained or mesoscopic systems. Using GPUs lammps/12.5.5
CFITSIO CFITSIO is a library of C and Fortran subroutines for reading and writing data files in FITS (Flexible Image Transport System) data format. CFITSIO provides simple high-level routines for reading and writing FITS files that insulate the programmer from the internal complexities of the FITS format libcfitsio/gnu/3310
FFTW is a C subroutine library for computing the discrete Fourier transform (DFT) in one or more dimensions, of arbitrary input size, and of both real and complex data (as well as of even/odd data, i.e. the discrete cosine/sine transforms or DCT/DST) libfftw/gnu/3.3.2
libfftw/gnu/3.3.2_mpi
libfftw/intel/3.3.2
libfftw/intel/3.3.2_mpilibfftw/pgi/3.2.2
libfftw/pgi/3.2.2_mpi
MPI for Python MPI for Python (mpi4py) provides bindings of the Message Passing Interface (MPI) standard for the Python programming language, allowing any Python program to exploit multiple processors. mpi4py/1.2.2
mpi4py/1.3
NAMD NAMD is a parallel molecular dynamics code designed for high-performance simulation of large biomolecular systems. Using GPUs. namd/2.9_cuda_mpi
namd/2.9_mpi
Octopus Octopus is a quantum-mechanics simulator. Using GPUs. octopus/4.1.0
octopus/4.1.1
octopus/4.1.2
Orca Orca is an electronic structure program package orca/3.0
ParaView paraview/3.98.0
PGI compiler pgi/12.5
pgi/12.6
pgi/12.9
pgi/13.3
PYCuda CUDA bindings for Python pycuda/2012.1
Python 2.7 and additional modules The version of python provided by the operating system is 2.6.6, if a later version is required then these modules can be used python/2.7
python27/mpi4py/1.2.2
python27/mpi4py/1.3
python27/numpy/1.6.2
python27/pycuda/2012.1
VMD VMD is a molecular visualization program for displaying, animating, and analyzing large biomolecular systems vmd/1.9.1

Software is provided via modules. Run

module avail

to see the list of loadable software. Run

module load <module name>

to load a module. Run

module list

to display your currently loaded modules. To see what a module is doing, Run

module show <module name>

When presenting your work that has made use of the Centre for Innovation services, please could  you use the standard acknowledgement below:

The authors would like to acknowledge that the work presented here made use of the Emerald High Performance Computing facility made available by the Centre for Innovation. The Centre is formed by the universities of Oxford, Southampton, Bristol, and University College London in partnership with the STFC Rutherford-Appleton Laboratory