Running R on the Cluster#
There are two common commands used when running R scripts on a HPC. They are R CMD BATCH and Rscript.
R CMD BATCH#
R CMD BATCH allows you to run an R infile in a non-interactive manner, then print the output to a file.
By default output will be saved to a file called infile.Rout where infile is the name of the script you submitted to the command.
Here’s an example:
#!/bin/bash
#SBATCH --time=01:00:00
#SBATCH –mem=1gb
#SBATCH --job-name=RTest
module load R
R CMD BATCH Rcode.R
In this example the output will be called Rcode.Rout. No output or error directive have been specified as they will automatically go into the Rcode.Rout file
To specify the output file, specify a file after your R infile in the R CMD BATCH command.
You can also pass arguments to the script in the command. Arguments need to specified after the R CMD BATCH statement, but before the R infile name.
For example:
#!bin/bash
#SBATCH --time=01:00:00
#SBATCH –mem=1gb
#SBATCH --job-name=RTest
module load R
R CMD BATCH “—args arg1 arg2 arg3” Rcode.R Routput.txt
Rcript#
Rscript will run your R infile similar to R CMD BATCH, however the output and errors can be directed to STDOUT and STDERR, allowing finer control.
We can also pass arguments to the script in a similar way.
Here’s an example:
#!/bin/bash
#SBATCH --time=01:00:00
#SBATCH –mem=1gb
#SBATCH --job-name=RTest
#SBATCH --error=RTest.%J.stderr
#SBATCH --output=RTest.%J.stdout
module load R
Rscript Rcode.R arg1 arg2 arg3
Parallel R#
Running a multicore parallel job is similar to our previous batch scripts, but we need to specify some more information about the computing environment.
The scheduler needs to know how many cores you need, we can set that with the ntasks directive as the default tasks per core is 1.
For simplicity we should also specify nodes=1 so all our tasks are running on the same node.
Here’s an example:
#!/bin/bash
#SBATCH --time=01:00:00
#SBATCH –mem=1gb
#SBATCH --ntasks =16
#SBATCH --nodes=1
#SBATCH --job-name=RTest
#SBATCH --error=RTest.%J.stderr
#SBATCH --output=RTest.%J.stdout
module load R
Rscript Rcode.R
Make sure you set ntasks to the number of cores you’ve used in your R code for best performance.
Installing R Packages via Conda#
If a package fails to install when using the install.packages funtion in R, you can try installing the package via Conda.
You’ll often find hte package your looking for in a conda repository.
For example you may want to install ggplot2:
install.packages("ggplot2")
If that fails, you can try installing the package via Conda. It’s always a good idea to create a new enviroment instead of installing into the Conda base enviroment:
conda create -n R
conda activate R
conda install r-ggplot2