One isoform can be selected as "canonical", based on experimental evidence collected into a database called APPRIS, or based on length or other criteria.

When you look at GFF3 files on the Gencode site, for instance, you may see some entries tagged with appris_* prefixes to denote the grade of evidence for "canonical-ity".

Other isoforms exist because genes can be transcribed in different ways. It can be useful to pick one isoform from all that are available for a gene, for the purposes of doing analyses.

In the UCSC browser, this isoform is perhaps experimentally-determined to be, say, expressed the most among all alternative transcripts, so it gets labeled with an inverted text label to give you a visual cue that this is canonical. The other labels are unadorned.

You might want to work with the canonical gene annotation, when doing your work. It can depend on your experiment.

Internally, UCSC keeps a table called knownCanonical that is used to label such isoforms. This table is available for direct inspection via Goldenpath for various assemblies, e.g. for hg38.

In hg38, as an example, the XIST gene has an isoform called ENST00000429829.6 which is labeled as canonical, and sits at chrX:73820655-73852723 (zero-indexed, which will be adjusted to one-indexed in the UCSC browser view).

You can grab the knownCanonical table and verify that this transcript is there:

% wget -qO- "http://hgdownload.soe.ucsc.edu/goldenPath/hg38/database/knownCanonical.txt.gz" | gunzip -c | grep ENST00000429829.6   
chrX    73820655    73852723    28961   ENST00000429829.6   ENSG00000229807.12

If you do the same for the canonical-labeled MYC for hg38 or other assembly, you should see a similar result:

% wget -qO- "http://hgdownload.soe.ucsc.edu/goldenPath/hg38/database/knownCanonical.txt.gz" | gunzip -c | grep ENST00000621592.8                                  
chr8    127736230   127742951   7390    ENST00000621592.8   ENSG00000136997.21



Source link