gravatar for lieven.sterck

1 hour ago by

VIB, Ghent, Belgium

Good question, not an easy straightforward answer I'm afraid.

There are a number of things to take into account here:

If you add up all the bit score of the HSPs you will often "overcount", HSPs in protein can often overlap each other and as such you will double count those regions in the final bitscore.

Taking the 'best HSP' is not a bad approach, given that you work with protein sequences you will have less occasions of split alignments (with nucleotides you have that more) , the best scoring HSPs will thus in most cases a continuous stretch of alignment.

If you want super accurate results and have time to do some scripting to get it , the best way is the adding up bitscore approach. Here you need to take into account that you can only add up non-overlapping regions (you need to sort of re-create the full alignment using the given HSPs). If on the other hand you want reliable result but don't want to spend much time on it go for the best HSP approach. This will be in the vast majority of cases an excellent approximation (certainly for protein sequences) and can get parsed directly from the original blast output efficiently.

