How to Speed Up BLASTp

2

Hello,

I have a fasta file including 140 protein sequences from distinct viruses and I would like to identify which protein comes from which virus.


I am using a Linux cluster, BLAST is available as a cluster module, and the viruses and NCBI nr databases are stored in my own directory(correct me if I used the wrong terminology) in the cluster.

I set up my blastp as below:

 blastp -db nr -query proteins.fa -outfmt 6 -out ./output.txt  -num_threads 10 -max_target_seqs 1

and requested the resources from cluster as:

#PBS -l mem=64gb,nodes=10:ppn=1,walltime=10:00:00

It has been running for around 10 hours and I haven’t got any results written in the output.txt. I am wondering if there is a better way to set up RAM, nodes, or process per node to speed up BLASTp run. Thank you so much!

Here is the info about the Linux cluster:

66 compute nodes. Each node has two 14-core Intel
processors (2.40GHz) sharing 128 GB of memory.


blastp


linux-cluster


BLAST


nr-database

• 256 views

You are requesting 10 nodes and 1 processor per node, however, blastp can only use one node. You should use:

#PBS -l mem=128gb,nodes=1:ppn=14,walltime=10:00:00

There are ways of splitting the input fasta file and submitting to several nodes, but with 140 sequences as input, it is not necessary.

You should contact the cluster administrators for instructions on how to properly use Torque / PBS resource manager. And before downloading NT / NR, you should also ask if these databases are already available at a centrally managed location - as they are widely used, this is commonly the case.

Another thing that may help is searching against a virus-only database, since at least 99.5% of nr are non-viral entries. Specific taxonomic entries can be downloaded from this link:

ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/taxonomic_divisions/

There are two files (sprot and trembl) for each group, and you would need the .dat.gz files. Those are in EMBL format, so you will need a program to convert them to FASTA. I know that a little utility called esl-reformat from the HMMer package can do it, and there are likely to be others.


Login
before adding your answer.

Traffic: 1865 users visited in the last hour



Source link