AlphaFold#
AlphaFold is deep learning program developed by DeepMind
It performs predictions of protein structure.
You can find more information about it on their website https://alphafold.ebi.ac.uk/ and their GitHub page https://github.com/deepmind/alphafold
We host a copy of the database in the /home/alphafold folder
You will also find the container alphafold.sif and a script to run it run_alphafold_singularity.sh
Here is an example SBATCH script to submit a job to the HPC:
#!/bin/bash
#SBATCH --job-name=alphafold_8_core # Job name
#SBATCH --mail-type=END,FAIL # Mail events (NONE, BEGIN, END, FAIL, ALL)
#SBATCH --mail-user=[your_email@lshtm.ac.uk] # Where to send mail
#SBATCH --ntasks=8 # Run on a single core
#SBATCH --mem=20gb # Job memory request
#SBATCH --time=10:00:00 # Time limit hrs:min:sec
#SBATCH --output=alphafold_%j.log # Standard output and error log
pwd; hostname; date
export CUDA_VISIBLE_DEVICES=0
export OPENMM_CPU_THREADS=8
module load singularity
/home/alphafold/run_alphafold_singularity.sh --fasta_paths /home/padmacor/fasta.fasta --max_template_date 2100-01-01
You’ll need to provide the run_alphafold_singularity.sh script with 2 options, the path to the fasta file you want to analyse, and the max template date.
AlphaFold will search for the available templates before the date specified by the –max_template_date parameter, this could be used to avoid certain templates during modeling.
The script requests 8 cores and 20GB RAM as it’s allocation.
This should be sufficent RAM, and about the point of diminishing returns in number of cpus per task.
Moving from 4 CPUs to 8 would take 50% less time, while 16 CPUs only produces a 25% decrease in time.
If you need to increase it, change the –mem=** section for the RAM you need. Then change –ntasks=** and OPENMM_CPU_THREADS=** to the number of CPUs you desire.