Just wondering if anyone has experienced a similar issue. I'm using blast+ version 2.11 on my school's cluster to blast fasta files of size about 23-60 Mb (~1000-4000 sequences).

Specifically I am using blastn with the nt database.

What is weird is that once my blast is done, sequences are missing. For example I am blasting contigs from metaspades and I notice that if my contig file has 1000 sequences, blast will only give me hits for 500 of them. I run it a few times and it is always the same contigs that are missing from blast. I am not using any thresholds because I just want to see what is being matched to these sequences. I thought that maybe these sequences just don't have any hits - so I extract the missing sequences separately from the contig file and blast a few of them but I do get a result!

Why is blastn just skipping some contigs entirely?? The exact command I am using is below:

blastn -db nt -num_threads 32 -max_target_seqs 1 -max_hsps 1 
        -outfmt "6 qseqid sseqid qlen slen pident evalue score staxids stitle" 
        -query contig.fasta > contig_blast.output && echo "DONE" contig_blast.output

I just want the first top hit for each of my contigs but I need it for all the contigs in the file.



