Briefly , I work on the pangenome of a bacterium that secretes two types of toxins : Toxin A and Toxin B, I m counting on doing Bioinformatics analysis of several strains (145) in order to deduce the mutations in all of them and to know the most severe strains based on the SNP on the genes coding for the two tonxins , in the very begining i have Raw Data (reads ) the first thing i did was assembly (by Spades) and afetr i did annotation (using Prokka) of all the strains (145).
after extraction and analysis of the sequences of the genes from ffn file and I noticed that in some genomes the toxins sequences are not completed or fragmented sequences ( after comparaision of the sequence legth with the same sequence on NCBI or with other sequence of other strains )and i don't know is because there's problem in my Raw data or in the assembly step because of bad annotation !!
please if anyone have any idea can help to improve every step or any others idea or step i can do to achieve my objective , it ll be great help
