Hello,

Background:

I am analyzing data for an RNA-seq experiment made on a bacterial strain. I have a previously assembled genome that was annotated with PGAP. I processed my raw RNA-seq reads, aligned them to the genome with STAR, and quantified the alignments with HTseq. I fed the HTseq read counts to DESeq2 and now I have a table with the results, though I end up with a gene_id in the first column of the table and DESeq2 results.

Problem:
The gene_id is not very informative. I want the gene_id, the CDS start and end position (to control for gene length bias), and also the transcript product (protein) or function. I have this information in a GTF annotation file of course file but I can't find any sort of tools or code to add this information to my DESeq2 results.
DESeq2 results table

I thought it would be a simple task, but either it is not, or I am not understanding something properly. I have seen some tutorials using Biomart or other tools like that, but it seems like it's just for eukaryotes and it's using online databases. I guess I don't understand why one would not simply use the GTF annotation file?

My plan going forward is to annotate with GO terms or KEGG to do an enrichment analysis. The step I am stuck on seems easy but for some reason, I can't find a solution.

Any help is welcome, Thank you.



Source link