AlphaFold#

AlphaFold is deep learning program developed by DeepMind

It performs predictions of protein structure.

You can find more information about it on their website https://alphafold.ebi.ac.uk/ and their GitHub page https://github.com/deepmind/alphafold

We host a copy of the database in the /home/alphafold folder

You will also find the container alphafold.sif and a script to run it run_alphafold_singularity.sh

Here is an example SBATCH script to submit a job to the HPC:

#!/bin/bash
#SBATCH --job-name=alphafold_8_core    # Job name
#SBATCH --mail-type=END,FAIL          # Mail events (NONE, BEGIN, END, FAIL, ALL)
#SBATCH --mail-user=[your_email@lshtm.ac.uk]     # Where to send mail
#SBATCH --ntasks=8                    # Run on a single core
#SBATCH --mem=20gb                     # Job memory request
#SBATCH --time=10:00:00               # Time limit hrs:min:sec
#SBATCH --output=alphafold_%j.log   # Standard output and error log
pwd; hostname; date

export CUDA_VISIBLE_DEVICES=0
export OPENMM_CPU_THREADS=8

module load singularity

/home/alphafold/run_alphafold_singularity.sh --fasta_paths /home/padmacor/fasta.fasta --max_template_date 2100-01-01

You’ll need to provide the run_alphafold_singularity.sh script with 2 options, the path to the fasta file you want to analyse, and the max template date.

AlphaFold will search for the available templates before the date specified by the –max_template_date parameter, this could be used to avoid certain templates during modeling.

The script requests 8 cores and 20GB RAM as it’s allocation.

This should be sufficent RAM, and about the point of diminishing returns in number of cpus per task.

Moving from 4 CPUs to 8 would take 50% less time, while 16 CPUs only produces a 25% decrease in time.

If you need to increase it, change the –mem=** section for the RAM you need. Then change –ntasks=** and OPENMM_CPU_THREADS=** to the number of CPUs you desire.