I have reads from DNA sequencing,5 pools. The reads look like this:
@NS500455:80:HG7TNBGXB:1:11101:17723:1055 1:N:0:ATCACG ACTTANGTGTATGTAAACTTCCGACTTCAACTGTATAGGGATCCNAGCTCCAATTCGCCCTATAGTGAGTCGTAT + /AAAA#EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE#EEEEEEEEEEEEEEEEEEEEEEEEEEEEEE
It's illumina primer followed by the construct of a transposon which I had to filter. So, I used cutadapt:
for file in /FASTQ/* do cutadapt -g ACTTAAGTGTATGTAAACTTCCGACTTCAACTG --discard-untrimmed --minimum-length=35 "$file" -o "`basename -s .fastq & done
Then I tried aligning this with Bowtie2 using the command :
./bowtie2-align-s -x rat/rn4 -U A1_S1_R1_001.fastq --sensitive-local -S A_align.sam
Bowtie2 fails to align majority if them :
2648325 reads; of these: 2648325 (100.00%) were unpaired; of these: 2242006 (84.66%) aligned 0 times 309307 (11.68%) aligned exactly 1 time 97012 (3.66%) aligned >1 times 15.34% overall alignment rate
I tool the aligned and the unaligned sequences and used BLAT to figure out any contamination and to see if the sequences are from the rat genome. All positive.
I don't have any clue as to why this is happening.
Can someone help please.
regards to all.