How are BWA/bowtie/etc indices built for multiple fasta entries?


I'm confused by how an FM-index is built from a genome with multiple chromosomes (or more generically, any multi-sequence file). I understand the principles of the BWT, but do aligners such as BWA and Bowtie compute a separate BWT for each sequence or do they concatenate all sequences then compute a single BWT?

I'm interested to know for the sake of it but also because I need to include the mitochondrial and chloroplast genomes in an index, but BWA has one indexing method (IS) that can't handle a 'database' more than 2 Gbp while the other method (BWTSW) can't handle databases smaller than 10 Mbp (the organelle genomes are smaller than this...)

I just don't know if 'database' in the documentation means the sum of all sequences or whether each sequence is considered a separate database.
If the sequences are all concatenated then BWTSW should work fine, but otherwise it seems neither single indexing method works for both the large chromosomes I have to deal with and the tiny organelle genomes.

Thanks for your time!







Source link