gravatar for PierQ

2 hours ago by

Hi,

I am new to bioinformatics and I am not very familiar with the human genome.

I mapped my RNA-seq reads on the human genome using hisat-2.

After mapping, I would like to count, for each gene, the number of reads mapping on exons or introns. What I am lacking is a gtf or a bed file with the coordinates of introns and exons for all protein-coding genes.

I am trying to use the Table browser on genome.ucsc.edu/cgi-bin/hgTables (as described here: www.biostars.org/p/13290/) and I get a BED file in the following format:

chr12 6534569 6534809 ENST00000496049.1_intron_0_0_chr12_6534570_f 0 +

chr12 6534569 6534809 ENST00000229239.10_intron_0_0_chr12_6534570_f 0 +

chr12 6534861 6536493 ENST00000229239.10_intron_1_0_chr12_6534862_f 0 +

chr12 6536593 6536683 ENST00000229239.10_intron_2_0_chr12_6536594_f 0 +

chr12 6536790 6536919 ENST00000229239.10_intron_3_0_chr12_6536791_f 0 +

chr12 6537010 6537100 ENST00000229239.10_intron_4_0_chr12_6537011_f 0 +

chr12 6537216 6537308 ENST00000229239.10_intron_5_0_chr12_6537217_f 0 +

chr12 6537390 6537583 ENST00000229239.10_intron_6_0_chr12_6537391_f 0 +

As you can see, the table has only the transcript identifier. I would need the gene name or the gene identifier (starting with ENSG) to count and assign all intronic reads to a specific gene. I would then use featureCounts to count reads for any specific feature.

Do you have any idea how to do that? Probably my approach is completely wrong.

Thank you in advance

link

modified 1 hour ago

written
2 hours ago
by

PierQ0



Source link