I have downloaded some already published raw data (fastqs). Initially, I did QC and found adapter content in both forward and reverse reads.
Below you can see the fastqc details before adapter trimming of both forward and reverse reads:
To remove the adapter content I used
cutadapt like below:
cutadapt -a AGATCGGAAGAGCACACGTCTGAACTCCAGTCA -A AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT -o tr_sample_R1.fastq.gz -p tr_sample_R2.fastq.gz sample_R1.fastq.gz sample_R2.fastq.gz
With adapter trimming I see like below:
So, I have some questions:
1) Before adapter trimming, sequence length distribution was looking fine but after adapter trimming I see that something went wrong. Why is it like that?
2) I see that there is some bias in the first 10-15 bases. What I should do for that? Is it really a problem?
3) Why the GC content have multiple peaks?
Please clarify my doubts. thanks in advance.