gravatar for ali_karimnezhad

2 hours ago by

I have 5 fasta files (correspondent to 5 strains) downloaded from ncbi, and I am going to use Beast2 to build a phylogenetic tree. Each of my fasta files has multiple contigs, and I have a reference genome. For example, one of my fasta files come from here and it has 67 contigs. The corresponding reference genome has two chromosomes only.

Following this post I aligned each of my fasta files to my reference genome using bwa mem to generate a sam file, and then I took a few steps (samtools view/sort/index/etc) to generate a final fasta file. This new fasta file has only two rows of nucleotides because my reference genome has two chromosomes. I repeated these steps for my other 4 original fasta files and got 4 new fasta files. Now, I am going to use Beast2 to build a phylogenetic tree but the input file requires 5 rows of aligned sequences where each sequence refers to one species, while for each species in my new fasta files I have two rows of sequences.

Does my approach make any sense? Is there any way I can generate only one row of sequences for each species? I was thinking about SNP calling but do not know if it is the right approach.


Source link