Conda#

What is conda#

Conda is a package and enviroment management system.

It allows uers to easily install different versions of software packages and libraries at the same time.

If you need an older or newer version of a package than the HPC provides, conda can provide a solution.

You can create an eviroment with the correct versions of packages you need without affecting the enviroment for anyone else.

Installing Conda (MiniConda)#

You can download a version of conda from here: https://docs.conda.io/en/latest/miniconda.html

If we were to install the latest version, we could use this command:

wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash ./Miniconda3-latest-Linux-x86_64.sh

miniconda will be installed to your home directory (~/miniconda3)

As it modifies your bash profile, you will need to sign out and back in to use it, or run this command:

source ~/.bashrc

Once installed, it will modify your command prompt. You should now see (base) before your username:

(base) [padmacor@loginhpc ~]$

You can use it by running the conda command

We can run the following to get more information about our installation:

(base) [padmacor@loginhpc ~]$ conda info

             active environment : base
    active env location : /home/padmacor/miniconda3
            shell level : 1
       user config file : /home/padmacor/.condarc
 populated config files :
          conda version : 23.3.1
    conda-build version : not installed
         python version : 3.10.10.final.0
       virtual packages : __archspec=1=x86_64
                          __glibc=2.28=0
                          __linux=4.18.0=0
                          __unix=0=0
       base environment : /home/padmacor/miniconda3  (writable)
      conda av data dir : /home/padmacor/miniconda3/etc/conda
  conda av metadata url : None
           channel URLs : https://repo.anaconda.com/pkgs/main/linux-64
                          https://repo.anaconda.com/pkgs/main/noarch
                          https://repo.anaconda.com/pkgs/r/linux-64
                          https://repo.anaconda.com/pkgs/r/noarch
          package cache : /home/padmacor/miniconda3/pkgs
                          /home/padmacor/.conda/pkgs
       envs directories : /home/padmacor/miniconda3/envs
                          /home/padmacor/.conda/envs
               platform : linux-64
             user-agent : conda/23.3.1 requests/2.28.1 CPython/3.10.10 Linux/4.18.0-425.3.1.el8.x86_64 centos/8.7 glibc/2.28
                UID:GID : 12317:10000
             netrc file : None
           offline mode : False

Using Conda#

Conda allows us to create different enviroments for different purposes, and to change enviroment when needed.

For Example, you may need a specific version of R that is not currently installed on the HPC.

We can create an enviroment to install it in without needing to wait for the HPC Admin.

To create the enviroment we use the command conda create:

conda create --name R.4.2

Conda will then configure the new enviroment, and we can use the following command to enter it:

conda activate R.4.2

Now we have the enviroment activated, we can start installing the software we need.

Let’s search the conda repository to find which versions of R are available:

conda search R

We get this result:

Loading channels: done
# Name                       Version           Build  Channel
r                              3.3.1        r3.3.1_0  pkgs/r
r                              3.3.1        r3.3.1_1  pkgs/r
r                              3.3.2        r3.3.2_0  pkgs/r
r                              3.4.1        r3.4.1_0  pkgs/r
r                              3.4.2      h65d9972_0  pkgs/r
r                              3.4.3        mro343_0  pkgs/r
r                              3.4.3          r343_0  pkgs/r
r                              3.5.0        mro350_0  pkgs/r
r                              3.5.0          r350_0  pkgs/r
r                              3.5.1        mro351_0  pkgs/r
r                              3.5.1          r351_0  pkgs/r
r                              3.6.0           r36_0  pkgs/r

It seems like the version of R we’re looking is not hosted in the default channel.

Let’s specify conda-forge as the channel. It’s a community managed collection of packages for conda.

99% of the time you should find what you’re looking for there:

conda search R --channel conda-forge

The –channel (or -c) option allows us to speficy a channel to use. We can also specify multiple channels at the same time:

conda search R -c conda-forge -c bioconda

(bioconda is a channel consisting of thousands of software packages related to biomedical research)

Searching on conda-forge gives us this result:

# Name                       Version           Build  Channel
r                              3.3.1        r3.3.1_0  pkgs/r
r                              3.3.1        r3.3.1_1  pkgs/r
r                              3.3.2        r3.3.2_0  conda-forge
r                              3.3.2        r3.3.2_0  pkgs/r
r                              3.4.1        r3.4.1_0  conda-forge
r                              3.4.1        r3.4.1_0  pkgs/r
r                              3.4.2      h65d9972_0  pkgs/r
r                              3.4.3        mro343_0  pkgs/r
r                              3.4.3          r343_0  pkgs/r
r                              3.5.0        mro350_0  pkgs/r
r                              3.5.0          r350_0  pkgs/r
r                              3.5.1        mro351_0  pkgs/r
r                              3.5.1          r351_0  conda-forge
r                              3.5.1          r351_0  pkgs/r
r                              3.5.1       r351_1000  conda-forge
r                              3.5.1        r35_1002  conda-forge
r                              3.5.1        r35_1003  conda-forge
r                              3.6.0           r36_0  pkgs/r
r                                3.6        r36_1002  conda-forge
r                                3.6        r36_1003  conda-forge
r                                3.6        r36_1004  conda-forge
r                                4.0        r40_1004  conda-forge
r                                4.0 r40hd8ed1ab_1004  conda-forge
r                                4.1 r41hd8ed1ab_1004  conda-forge
r                                4.1 r41hd8ed1ab_1005  conda-forge
r                                4.1 r41hd8ed1ab_1006  conda-forge
r                                4.2 r42hd8ed1ab_1006  conda-forge

We can see the version we’re looking for is available through conda forge.

To install a package in conda we can use the conda install command.

In this case as there are many versions of R available, we will need to specify the build.

We can use the command in this format:

conda install -c <channel> <package_name>=<version>=<build_string>

Which in our case would be:

conda install -c conda-forge r=4.2=r42hd8ed1ab_1006

conda will now download all the packages needed to run R.

In my case it was 108 packages, with a size of 255Mb.

As conda keeps each enviroment seperate from each other and the OS, each enviroment needs and independant set of packages.

This can take up quite a lot of space if you install lots of software in the enviroment.

Now we have R installed, we can run it and check the version:

R --version

R version 4.2.3 (2023-03-15) -- "Shortstop Beagle"

Once we’ve finished using the enviroment, we can deactivate it by typing:

conda activate

Running the conda activate command without nameing an eviroment will unload the current enviroment.

Base Enviroment#

When you install conda it will activate the default “base” enviroment. This includes Python installation and some core libraries and dependencies of conda.

Conda recommend you avoid installing additional packages into your base enviroment. Addiotnal packages needed should be installed into their own conda enviroment.

The base enviroment will be activated each time you login, but this may not be what you desire.

You can prevent this behavior by using the conda config command:

conda config --set auto_activate_base false

Or deactivate the base enviroment with the command:

conda deactivate

This can be usefull if you want to use the python packages installed centrally on the HPC.

Using conda with sbatch#

You can include conda commands in your sbatch scripts:

#!/bin/bash
#SBATCH --job-name=parallel_test      # Job name
#SBATCH --mail-type=END,FAIL          # Mail events (NONE, BEGIN, END, FAIL, ALL)
#SBATCH --mail-user=email@lshtm.ac.uk # Where to send mail
#SBATCH --nodes=1                     # Run all processes on a single node
#SBATCH --ntasks=16                   # Run a single task
#SBATCH --mem=1gb                     # Total memory limit
#SBATCH --time=00:05:00               # Time limit hrs:min:sec
#SBATCH --output=parallel_%j.log      # Standard output and error log
date;hostname;pwd

conda activate R.4.2

echo "Running R in parallel on 16 cores"

Rscript Rcode.R

date

While you can include conda install commands in your sbatch script, i’d suggest configuring the enviroment before submitting your jobs.

Troubleshooting#

If you’re getting messages like:

/bin/sh: module: line 1: syntax error: unexpected end of file

/bin/sh: error importing function definition for `BASH_FUNC_module'

/bin/sh: switchml: line 1: syntax error: unexpected end of file

/bin/sh: error importing function definition for `BASH_FUNC_switchml'

/bin/sh: _module_raw: line 1: syntax error: unexpected end of file

/bin/sh: error importing function definition for `BASH_FUNC__module_raw'

In the error output after submitting your job, try adding the following to your script after the SBATCH options:

source ~/.bashrc

When you activate an eviroment in your SBATCH script use source instead of conda:

source activate **enviroment**

Documentation#

More information about the conda commands can be found at: https://docs.conda.io/projects/conda/en/latest/commands/index.html