i guess this will be more of an statistical question, if it violates the rules of here, i can also post it somewhere else.

So i am doing amplicon seq analysis with Illumina reads. The reads are sequenced paired end. The issue that i have is than the reverse reads are getting really short (25% of original length, when trimmed from the outside with a minimal phred score of 30. Now the reads are only hardly overlapping.

When trimmming with a minimal phred score of 25, the problem dissolves.

Also, i am looking for SNPs.

Now the point what i am asking is, how do i calculate the sufficient(what ever that means?) sequencing depth to distiguish SNPs from sequencing errors, regarding the change in the minimal phred score.

For example a minimal phred score of 25 means the basecalling error will be maximum 0.3%. Now what sequencing depth do i need to have a statistical difference between SNP and sequencing error?

Another solution would be to only keep the SNPs, where the alternative is above 0.3% and say a depth of 100 of this SNP should be allright (i think in a publication it won´t be asked but i just want to understand for myself).

Thanks a lot

p.s. i would be happy if you could share any literature about this 🙂

Source link