I just begin with the analysis of my RNAseq data. I have my own reference genome and its annotated file. I´ve been reading and I decided to use the following pipeline:
FastQC ->Trimmomatic -> Bowtie2 -> HTSEQ
But, I´ve had some problems, for example
My reads have good quality but there are some duplicates sequences. When I use Trimmomatic it does not change, the duplicate sequences are still there and I don´t know why.
java -jar /path/trimmomatic.jar PE control_rep1_1.fq.gz control_rep1_2.fq.gz control_rep1_1_paired.fq.gz control_rep1_1_unpaired.fq.gz control_rep1_2_paired.fq.gz control_rep1_2_unpaired.fq.gz ILLUMINACLIP:TruSeq3-PE.fa:2:30:10:2:keepBothReads LEADING:3 TRAILING:3 MINLEN:36
Despite that I went ahead with the analysis just for practice.
When I used HTSeq the results were like this
Warning: Mate records missing for 26418 records; first such record: <SAM_Alignment object: Paired-end read 'A00877:83:HNHLVDSXX:3:1519:15230:29340' aligned to tig00000001:[616,766)/+>. 8382045 BAM alignment pairs processed. no feature 7727004 ambiguous 0 Too low aQual 278161 not aligned 376880 alignment_not_unique 0
Some forum say that it could be for the annotation file structure.
Somebody knows what is happening?
Pd. I really need help and would like to talk with someone maybe for skype or something like that. If there is somebody with a few freetime, please let me know.