I aligned a set of reads with C elegans genome. The alignment scores were around 80%, except for two samples, which hit 40%. I blasted the unaligned reads and it seems to come from drosophila (which we have no idea why). I aligned the samples again, this time for drosophila, and those 2 samples got a score of around 40% as well. Because the sample size is small I have been considering discarding the unmapped reads instead of discarding the whole sample. I assume a normalization like TMM could reduce the possible noise caused by the reduced counts and if the PCA clusters make sense, I would use the data in downstream analysis. Any opinions on this? Should I just discard those samples?

Source link