gravatar for yancychy

2 hours ago by

Hi,
I want to get the 5'-UTR and 3'-UTR coodinates from the annotation file. I downloaded the "All GENCODE VM24" bed file from UCSC genome browser. The format of the bed file is following:

bin                 name   chrom  strand   txStart      txEnd            
1023 ENSMUST00000000090.7    chr9      +     57521278   57532426    

cdsStart       cdsEnd         exonCount 
57521327       57531782         5

exonStarts                                        exonEnds
57521278,57528955,57530240,57531668,57532281,     57521415,57529072,57530362,57531792,57532426,       

score  name2    cdsStartStat   cdsEndStat  exonFrames 
0      Cox5a         cmpl      cmpl       0,1,1,0,-1,

For 5'UTR, its length is cdsStart - txStart = 57521327 - 57521278 = 49, from (57521278, 57521327 ]

For 3'UTR, its length is txEnd - cdsEnd = 57532426 - 57531782 = 644, from (57531782, 57532426 ]

However, the length of all exons (the trs) is only 645.
useast.ensembl.org/Mus_musculus/Transcript/Sequence_cDNA?db=core;g=ENSMUSG00000000088;r=9:57521279-57532426;t=ENSMUST00000000090

Is there anything wrong?

In addition, I find the 5'UTR and 3'UTR length are 1 for some transcript. Is it reasonable? In the link A: Easy Way To Get 3' Utr Lengths Of A List Of Genes, the 3'UTR length of OR4F5 is 0 and 1.

5utr ensembl_transcript_id  
A    ENSMUST00000054837

3utr ensembl_transcript_id   
G    ENSMUST00000073261



Source link