I am following the workflow of Erin Young (github.com/CDCgov/SARS-CoV-2_Sequencing/tree/master/protocols/BFX-UT_ARTIC_Illumina) for assembly of the genomes of SARS-Cov-2.
For sequencing the genomes, we utilized the QIAseq Library kit, which is based on ARTIC workflow, and the Illumina platform. One important step of the QIAseq kit is the fragmentation before the adapter ligation.
In the bioinformatic workflow, after mapping the reads to the reference genome, we have to trim the primers from the aligned reads to account for the true variation of the virus, and for this process we utilized ivar.
I read that it is recommended to include reads that do not present primers in their sequence, because of the fragmentation step (covid19.galaxyproject.org/artic/#a-galaxy-workflow-for-the-analysis-of-illumina-paired-end-sequenced-artic-amplicon-data). The problem is that when i utilize the "-e option" of ivar, which includes reads without primer, the consensus sequence presents more Ns than that generated without the "-e option".
Why is it happening? Since more reads are being included, I wondered that I would have a better consensus.