I am attempting to do some de novo assembly of some viral sequencing data for some downstream SNP-calling and further analysis. However, I am running into a few problems that I am not sure how to overcome.
Unfortunately, due to the nature of the way we cultured these viruses, there is some contamination of the host cells' genomic material in the sequencing data. As such, I know that I need to do some host read subtraction to remove any human reads from the data pool. Normally I would do this by aligning the reads to a reference genome, however the virus we are using is a mutant virus with a few genes not present in the wild type, and as such I am worried that assembling using a reference my filter out any reads associated with these genes. So essentially, I need to find a way of filtering out my reads that are derived from the human genome, but to keep any reads associated with my mutant.
I am sure there is a solution, but I am a little bit stumped and searching around is not yielding me any results that I can apply...
Any advice would be greatly appreciated, and happy to give further clarification!