gravatar for rohitsatyam102

2 hours ago by

Hi All!

I am new to DEXSeq. When looking at the counts' files, I see a lot of reads are _empty assignment. From this thread, I learned these are the reads DEXSeq doesn't consider in the actual counts and are ones that fall in the intronic region. Since I am bound to use "-r no" option (because the gene I am interested in a gene that gets aggregated with 5 neighbor genes which are undesirable for my analysis), I see read assigned to _empty .

ENSG00000288588.1:002   0
ENSG00000288588.1:003   0
ENSG00000288588.1:004   0
ENSG00000288588.1:005   0
ENSG00000288588.1:006   0
_ambiguous      26889
_ambiguous_readpair_position    0
_empty  16171092
_lowaqual       0
_notaligned     0

Using '-r no' option

ENSG00000288588.1:002   0
ENSG00000288588.1:003   0
ENSG00000288588.1:004   0
ENSG00000288588.1:005   0
ENSG00000288588.1:006   0
_ambiguous      37555
_ambiguous_readpair_position    0
_empty  20575375
_lowaqual       0
_notaligned     0

My data contains ~63 million reads that are uniquely mappable and rest ~22 million are multi-mappable reads giving ~85 million mappable reads. Loosing 16-20 million reads means losing 1/4th of the data.

Relevant Codes

python ./DEXSeq/python_scripts/dexseq_prepare_annotation.py -r no /rnaseq/gencode.v33.annotation.gtf gencode.v33.gff 2> gff.stderr

python ./DEXSeq/python_scripts/dexseq_count.py -p yes -r pos -s reverse -a 10 gencode.v33.gff $p ${name}.Readcounts.txt 2> ${name}.stderr



Source link