Conda#
What is conda#
Conda is a package and enviroment management system.
It allows uers to easily install different versions of software packages and libraries at the same time.
If you need an older or newer version of a package than the HPC provides, conda can provide a solution.
You can create an eviroment with the correct versions of packages you need without affecting the enviroment for anyone else.
Installing Conda (MiniConda)#
You can download a version of conda from here: https://docs.conda.io/en/latest/miniconda.html
If we were to install the latest version, we could use this command:
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash ./Miniconda3-latest-Linux-x86_64.sh
miniconda will be installed to your home directory (~/miniconda3)
As it modifies your bash profile, you will need to sign out and back in to use it, or run this command:
source ~/.bashrc
Once installed, it will modify your command prompt. You should now see (base) before your username:
(base) [padmacor@loginhpc ~]$
You can use it by running the conda command
We can run the following to get more information about our installation:
(base) [padmacor@loginhpc ~]$ conda info
active environment : base
active env location : /home/padmacor/miniconda3
shell level : 1
user config file : /home/padmacor/.condarc
populated config files :
conda version : 23.3.1
conda-build version : not installed
python version : 3.10.10.final.0
virtual packages : __archspec=1=x86_64
__glibc=2.28=0
__linux=4.18.0=0
__unix=0=0
base environment : /home/padmacor/miniconda3 (writable)
conda av data dir : /home/padmacor/miniconda3/etc/conda
conda av metadata url : None
channel URLs : https://repo.anaconda.com/pkgs/main/linux-64
https://repo.anaconda.com/pkgs/main/noarch
https://repo.anaconda.com/pkgs/r/linux-64
https://repo.anaconda.com/pkgs/r/noarch
package cache : /home/padmacor/miniconda3/pkgs
/home/padmacor/.conda/pkgs
envs directories : /home/padmacor/miniconda3/envs
/home/padmacor/.conda/envs
platform : linux-64
user-agent : conda/23.3.1 requests/2.28.1 CPython/3.10.10 Linux/4.18.0-425.3.1.el8.x86_64 centos/8.7 glibc/2.28
UID:GID : 12317:10000
netrc file : None
offline mode : False
Using Conda#
Conda allows us to create different enviroments for different purposes, and to change enviroment when needed.
For Example, you may need a specific version of R that is not currently installed on the HPC.
We can create an enviroment to install it in without needing to wait for the HPC Admin.
To create the enviroment we use the command conda create:
conda create --name R.4.2
Conda will then configure the new enviroment, and we can use the following command to enter it:
conda activate R.4.2
Now we have the enviroment activated, we can start installing the software we need.
Let’s search the conda repository to find which versions of R are available:
conda search R
We get this result:
Loading channels: done
# Name Version Build Channel
r 3.3.1 r3.3.1_0 pkgs/r
r 3.3.1 r3.3.1_1 pkgs/r
r 3.3.2 r3.3.2_0 pkgs/r
r 3.4.1 r3.4.1_0 pkgs/r
r 3.4.2 h65d9972_0 pkgs/r
r 3.4.3 mro343_0 pkgs/r
r 3.4.3 r343_0 pkgs/r
r 3.5.0 mro350_0 pkgs/r
r 3.5.0 r350_0 pkgs/r
r 3.5.1 mro351_0 pkgs/r
r 3.5.1 r351_0 pkgs/r
r 3.6.0 r36_0 pkgs/r
It seems like the version of R we’re looking is not hosted in the default channel.
Let’s specify conda-forge as the channel. It’s a community managed collection of packages for conda.
99% of the time you should find what you’re looking for there:
conda search R --channel conda-forge
The –channel (or -c) option allows us to speficy a channel to use. We can also specify multiple channels at the same time:
conda search R -c conda-forge -c bioconda
(bioconda is a channel consisting of thousands of software packages related to biomedical research)
Searching on conda-forge gives us this result:
# Name Version Build Channel
r 3.3.1 r3.3.1_0 pkgs/r
r 3.3.1 r3.3.1_1 pkgs/r
r 3.3.2 r3.3.2_0 conda-forge
r 3.3.2 r3.3.2_0 pkgs/r
r 3.4.1 r3.4.1_0 conda-forge
r 3.4.1 r3.4.1_0 pkgs/r
r 3.4.2 h65d9972_0 pkgs/r
r 3.4.3 mro343_0 pkgs/r
r 3.4.3 r343_0 pkgs/r
r 3.5.0 mro350_0 pkgs/r
r 3.5.0 r350_0 pkgs/r
r 3.5.1 mro351_0 pkgs/r
r 3.5.1 r351_0 conda-forge
r 3.5.1 r351_0 pkgs/r
r 3.5.1 r351_1000 conda-forge
r 3.5.1 r35_1002 conda-forge
r 3.5.1 r35_1003 conda-forge
r 3.6.0 r36_0 pkgs/r
r 3.6 r36_1002 conda-forge
r 3.6 r36_1003 conda-forge
r 3.6 r36_1004 conda-forge
r 4.0 r40_1004 conda-forge
r 4.0 r40hd8ed1ab_1004 conda-forge
r 4.1 r41hd8ed1ab_1004 conda-forge
r 4.1 r41hd8ed1ab_1005 conda-forge
r 4.1 r41hd8ed1ab_1006 conda-forge
r 4.2 r42hd8ed1ab_1006 conda-forge
We can see the version we’re looking for is available through conda forge.
To install a package in conda we can use the conda install command.
In this case as there are many versions of R available, we will need to specify the build.
We can use the command in this format:
conda install -c <channel> <package_name>=<version>=<build_string>
Which in our case would be:
conda install -c conda-forge r=4.2=r42hd8ed1ab_1006
conda will now download all the packages needed to run R.
In my case it was 108 packages, with a size of 255Mb.
As conda keeps each enviroment seperate from each other and the OS, each enviroment needs and independant set of packages.
This can take up quite a lot of space if you install lots of software in the enviroment.
Now we have R installed, we can run it and check the version:
R --version
R version 4.2.3 (2023-03-15) -- "Shortstop Beagle"
Once we’ve finished using the enviroment, we can deactivate it by typing:
conda activate
Running the conda activate command without nameing an eviroment will unload the current enviroment.
Base Enviroment#
When you install conda it will activate the default “base” enviroment. This includes Python installation and some core libraries and dependencies of conda.
Conda recommend you avoid installing additional packages into your base enviroment. Addiotnal packages needed should be installed into their own conda enviroment.
The base enviroment will be activated each time you login, but this may not be what you desire.
You can prevent this behavior by using the conda config command:
conda config --set auto_activate_base false
Or deactivate the base enviroment with the command:
conda deactivate
This can be usefull if you want to use the python packages installed centrally on the HPC.
Using conda with sbatch#
You can include conda commands in your sbatch scripts:
#!/bin/bash
#SBATCH --job-name=parallel_test # Job name
#SBATCH --mail-type=END,FAIL # Mail events (NONE, BEGIN, END, FAIL, ALL)
#SBATCH --mail-user=email@lshtm.ac.uk # Where to send mail
#SBATCH --nodes=1 # Run all processes on a single node
#SBATCH --ntasks=16 # Run a single task
#SBATCH --mem=1gb # Total memory limit
#SBATCH --time=00:05:00 # Time limit hrs:min:sec
#SBATCH --output=parallel_%j.log # Standard output and error log
date;hostname;pwd
conda activate R.4.2
echo "Running R in parallel on 16 cores"
Rscript Rcode.R
date
While you can include conda install commands in your sbatch script, i’d suggest configuring the enviroment before submitting your jobs.
Troubleshooting#
If you’re getting messages like:
/bin/sh: module: line 1: syntax error: unexpected end of file
/bin/sh: error importing function definition for `BASH_FUNC_module'
/bin/sh: switchml: line 1: syntax error: unexpected end of file
/bin/sh: error importing function definition for `BASH_FUNC_switchml'
/bin/sh: _module_raw: line 1: syntax error: unexpected end of file
/bin/sh: error importing function definition for `BASH_FUNC__module_raw'
In the error output after submitting your job, try adding the following to your script after the SBATCH options:
source ~/.bashrc
When you activate an eviroment in your SBATCH script use source instead of conda:
source activate **enviroment**
Documentation#
More information about the conda commands can be found at: https://docs.conda.io/projects/conda/en/latest/commands/index.html