featureCounts assignment low when mapping to "exon"

Newbie here. I am trying to use featureCounts to assign reads to features (exons). My reference is E. Coli. I downloaded the gtf file from here: www.ncbi.nlm.nih.gov/genome/167?genome_assembly_id=161521 I am noticing that my % assigned is very low after running featureCounts. However, I also noticed in the gtf file there are only ~300 exons listed. Is this correct for E. coli? I can't find any resources online to help me with this. Here is the featureCounts command I run:

featureCounts -p -t exon -g gene -T 16 -s 2 -a GCF_000005845.2_ASM584v2_genomic.gtf -o counts.txt input_file.bam

Maybe I am not downloading the correct gtf file but I need this exact strain.





There is a lot of inconsistency when it comes to biological data. Different groups may choose to annotate their organisms differently.

In this case, only some regions of non-coding RNAs are annotated as exons, and for genes, only the coding sequences are properly annotated.

in short, use CDS instead of exon when counting with featurecounts

Here is how to verify the statements. Using the bio package (see: www.bioinfo.help/) one can quickly visualize the following:

# Fetch the data from NCBI.
bio fetch NC_000913

# Get all features in GFF format.
bio convert NC_000913 --gff > all.gff

# Get exons only in GFF format.
bio convert NC_000913 --gff --type exon > exon.gff

# Get the reference file in FASTA format.
bio convert NC_000913 --fasta > ref.fa

now visualize all three in IGV to get:

enter image description here

before adding your answer.

Traffic: 1725 users visited in the last hour

Source link