Skip to main content
NYU is reconvening for classes in-person and remotely. Resources, information, and official updates from NYU regarding the current status of COVID-19 and its impact on the University community are available here, which includes detailed links for students, faculty and staff.
Logo of The Courant Institure of Mathematical Sciences
Logo of the Courant Institute of Mathematical Sciences
  • Institute
    • Mathematics external link
    • Computer Science external link
  • Academics
    • Undergraduate Programs
      • Computer Science external link
      • Mathematics external link
    • Master's Programs
      • Computer Science external link
      • Mathematics external link
      • Data Science external link
      • Scientific Computing external link
      • Information Systems external link
      • Math Finance external link
      • Computing, Entrepreneurship & Innovation external link
    • PhD Programs
      • Computer Science external link
      • Mathematics external link
      • Atmosphere Ocean Science external link
    • Prizes & Fellowships
  • Research
    • Research Areas
    • Research Centers
    • Faculty Recognition
  • People
    • Institute Leadership
    • Faculty
    • Postdocs & Research Staff
    • Graduate Students
    • Staff
    • Directory (Courant access only)
  • Calendars
    • Weekly Seminar Bulletin
    • Special Events and Activities
    • Seminars List
    • Classroom Calendar & Reservations (NYU access only)
    • NYU Academic Calendars external link
  • Resources
    • Faculty, Staff, & Students
    • Visitor Information
    • Computing & Technology
    • Courant Library
  • About Us
    • Contact Information
    • Directions
    • Newsletters
    • History of the Courant Institute
    • Employment Opportunities at Courant
  • Giving

Computing

  • Home
  • Search

User Services

  • Computer Accounts
  • Network Access
  • Mail
  • Web Hosting
  • Databases
  • Version Control
  • Storage and Backups
  • NYU IT Resources and Policies

Resources

  • Desktop Computing
  • Computer Labs
  • Compute Servers
  • Printing
  • Scanners, Copiers, and DVD Burners
  • Classroom Facilities
  • Remote Teaching
  • Frequently Asked Questions

Platforms

  • Linux
  • Windows
  • Mac

Software

  • Overview
  • Linux
  • Cybersecurity

Announcements

  • General
  • Critical

 

Sheduling jobs using Slurm:

Jobs must be scheduled using the Slurm workload manager through the control node, cassio.cs.nyu.edu. Access to this cluster and the control node is restricted to those who are part of the CILVR group.

If you are on a network outside of CIMS, you will have to first login to access.cims.nyu.edu, then use ssh to get to cassio.cs.nyu.edu. Once you launch a job, you will be able to ssh directly to any of the assigned nodes. 

Slurm's documentation is very thorough, this quickstart guide should provide enough information to start exploring our Slurm setup.

 

 There are currently two Quality Of Service (QOS) policies which affect the jobs run via the scheduler:

(1) Interactive:   Each student can start an interactive job using up to 2 GPU's. It can be one interactive\ job with two GPU's or two interactive jobs with one GPU each.  Each job can be run up to 1 week. This should be used mainly for development. 

(2) Batch:  This is the default QOS. You can use up to 8 GPU's for batch jobs. Similar to the interactive mode, you can launch as many jobs as you want as long as the total number of allocated GPU's does not exceed eight. Each job can be run up to 2 days. This decision was made to ensure the fair allocation among users. Please use check-pointing for any longer job.

  

Below are a few command examples that are relevant to our environment:

 -Request a "batch" QOS bash shell session with one model "titanblack" GPU:

srun --qos=batch --gres=gpu:titanblack:1 --pty bash

 Currently, the GPU models you can specify using the --gres option are 1080ti, titanxp, titanblack, k40, k20, k20x, m2090

 

- Request 2 GPU cards by their Memory size by using the --constraint option along with the associated "feature" label:

srun --qos=interactive --gres=gpu:2 --constraint=gpu_12gb --pty bash

 

- Use boolean operators with the --constraint option to group feature requests:

srun --qos=interactive --gres=gpu:2 --constraint="gpu_12gb&kepler" --pty bash

 

- Show all nodes along with their associated state, number of CPUs, Memory (in MB), and "features":

sinfo -o --long --Node --format="%.6N %.8T %.4c %.10m %.20f"

  

- squeue displays job status, and can be formatted like sinfo:

squeue -l --format="%.5i %.15q %.6j %.6b %.6D %.6N %.25S %.16L"

 

- You can launch a job by creating a job script and running it using the sbatch command; below is an example of an sbatch script ("tf.sbatch") taken from http://sherlock.stanford.edu/mediawiki/index.php/Tensor_flow that that sends an email notification when the job completes; it has been modified it to run a tensorflow python script in our environment :

#!/bin/bash
#
# all commands that start with SBATCH contain commands that are just used by SLURM for
scheduling
#################
# set a job name
#SBATCH --job-name=GPUTFRtest
#################
# a file for job output, you can check job progress
#SBATCH --output=GPUTFtest.out
#################
# a file for errors from the job
#SBATCH --error=GPUTFtest.err
#################
# time you think you need; default is one hour
# in minutes
# In this case, hh:mm:ss, select whatever time you want, the less you ask for the # faster
your job will run.
# Default is one hour, this example will run in  less that 5 minutes.
#SBATCH --time=15:00
#################
# --gres will give you one GPU, you can ask for more, up to 4 (or how ever many are on the
node/card)
#SBATCH --gres gpu:2
# We are submitting to the batch partition
#SBATCH --qos=batch
#################
#number of nodes you are requesting
#SBATCH --nodes=1
#################
#memory per node; default is 4000 MB per CPU
#SBATCH --mem=4000
#################
# Have SLURM send you an email when the job ends or fails, careful, the email could end up
in your clutter folder
#SBATCH --mail-type=END,FAIL # notifications for job done & fail
#SBATCH --mail-user=user@courant.nyu.edu

Please note: before using python3 you will need to load it using our module system:

module load python-3

srun python3 ./tf_test.py

 

NVIDIA provides online GPU computing seminars.

 

 Let us know if there are any other specific examples you'd like us to provide or if you need anything else to get started. 

If you run into problems, please contact helpdesk@cims.nyu.edu.

  • New York University
  • Faculty of Arts and Science
  • College of Arts and Science
  • Graduate School of Arts and Science
  • Accessibility

New York University is committed to maintaining an environment that encourages and fosters respect for individual values and appropriate conduct among all persons. In all University spaces—physical and digital—programming, activities, and events are carried out in accordance with applicable law as well as University policy, which includes but is not limited to its Non-Discrimination and Anti-Harassment Policy.

Please e-mail comments or corrections to: jroznfgre@pbhenag.alh.rqh
For other inquiries, please see the list of contacts.
© New York University