Dear all,

Recently, I have been asked to do preprocessing of some fastq files produced by Illumina (I don't know which machine produced data).

This is information of a fastq file (forward);

@A00957:111:H5MTHDSX2:3:1101:2718:1063 1:N:0:TCCGCGAA+AGGCTATA
CTGACCTCAAGTGATCTACCCACCTCGGTCTCCCAAAGTGCTGGGATTACAGGCAGGAGCCACTGCCCCTGGCCCTAATCATAGATCGGAAGAGCACACGTCTGAACTCCAGTCACTCCGCGAAATCTCGTATGCCGGCGTCTGCTTGAAA

when I asked adapter sequences from the company, they provided me them as D710-501 TCCGCGAATATAGCCT (This is for one sample of forward and reverse).

When I checked the header of the fastq file, it can be seen as TCCGCGAA+AGGCTATA

On the other hand, at Illumina's documentation the information is as below:

TruSeq DNA and RNA CD Indexes

Index 1 (i7) Adapters
CTAGCGCT
GTGTAGAC
GATCGGAAGAGCACACGTCTGAACTCCAGTCAC[i7]ATCTCGTATGCCGTCTTCTGCTTG

I want to remove adapters from fastq files. I am a little bit confused about how to specify adapter sequences in an adapter file that will be used as input in fastp or Trimmomatic.

For example,

Is it okay to write as TCCGCGAATATAGCCT in the adapter fasta file or should I specify all? I mean like this (replacing i7 in the illumina documentation with sequences given at the header of the fastq file);

Read1 adapter;

GATCGGAAGAGCACACGTCTGAACTCCAGTCAC[TCCGCGAA]ATCTCGTATGCCGTCTTCTGCTTG

Read2 adapter;

GATCGGAAGAGCACACGTCTGAACTCCAGTCAC[AGGCTATA]ATCTCGTATGCCGTCTTCTGCTTG



Source link