gravatar for liz.marjory

2 hours ago by

Spain/Experimental Station of Zaidin

Hi everyone
I just begin with the analysis of my RNAseq data. I have my own reference genome and its annotated file. I´ve been reading and I decided to use the following pipeline:

FastQC ->Trimmomatic -> Bowtie2 -> HTSEQ

But, I´ve had some problems, for example

My reads have good quality but there are some duplicates sequences. When I use Trimmomatic it does not change, the duplicate sequences are still there and I don´t know why.

java -jar /path/trimmomatic.jar PE control_rep1_1.fq.gz control_rep1_2.fq.gz control_rep1_1_paired.fq.gz control_rep1_1_unpaired.fq.gz control_rep1_2_paired.fq.gz control_rep1_2_unpaired.fq.gz ILLUMINACLIP:TruSeq3-PE.fa:2:30:10:2:keepBothReads LEADING:3 TRAILING:3 MINLEN:36

Despite that I went ahead with the analysis just for practice.

When I used HTSeq the results were like this

Warning: Mate records missing for 26418 records; first such record: <SAM_Alignment object: Paired-end read 'A00877:83:HNHLVDSXX:3:1519:15230:29340' aligned to tig00000001:[616,766)/+>.
8382045 BAM alignment pairs processed.

no feature 7727004
ambiguous 0
Too low aQual 278161
not aligned 376880
alignment_not_unique 0

Some forum say that it could be for the annotation file structure.
Somebody knows what is happening?

Pd. I really need help and would like to talk with someone maybe for skype or something like that. If there is somebody with a few freetime, please let me know.

Source link