gravatar for paul.jaschke

2 hours ago by

Hi all,
Using Gviz I am trying to create a figure where I have the exons from a gene in one track and below that the protein domains mapped to the genome sequence. I am able to map the transcript/exon information easily by grabing info from the UCSC site using the UcscTrack() function but cannot seem to properly grab the data out of the protein domain table 'unipDomain' represented by the schema here.

The problem seems to be the way the data is organized within the table with no easy 1:1 mapping between the 'chromStart' and 'chromEnd' data columns because there are multiple start positions relative to chromStart represented in the 'chromStarts'. That is, you need to offset a certain amount from the chromStart value using the chromStarts values (comma separated values) and then represent the width of the feature by the 'blockSizes' data, stored as comma-separated values. I don't see any way to do this with the GeneRegionTrack, AnnotationTrack, or DataTrack classes. Any help would be greatly appreciated, either solving this problem or pointing me towards an easier way to represent domain information on the chromosome with another method. Thanks!

This is what I would expect it to look like based on the track in UCSC browser ucsc browser image

This is what mine looks like (one big box instead of split along with exons) Gviz image

Code used to generate image above


gen <- "hg38"
chr <- "chr11"

# Create the Ideogram track
itrack <- IdeogramTrack(genome = gen, chromosome = chr)

# Create the GenomeAxisTrack
# GenomeAxisTrack class
gtrack <- GenomeAxisTrack()

## example of FKBP2
from <- 64240500
to <- 64244500
knownGenes <- UcscTrack(genome = "hg38", 
                        chromosome = "chr11",
                        track = "knownGene", 
                        from = from,
                        to = to, 
                        trackType = "GeneRegionTrack",
                        rstarts = "exonStarts", 
                        rends = "exonEnds", 
                        gene = "name",
                        symbol = "name",
                        transcript = "name",
                        strand = "strand",
                        fill = "#8282d2",
                        name = "UCSC Genes")

domains <- UcscTrack(genome = "hg38",
                     chromosome = "chr11",
                     track = "uniprot",
                     table = "unipDomain",
                     from = from,
                     to = to,
                     trackType = "GeneRegionTrack",
                     rstarts = "chromStart",
                     rends = "chromEnd",
                     gene = "name",
                     symbol = "name",
                     strand = "strand",
                     name = "domain")

plotTracks(list(itrack, gtrack, knownGenes, domains), transcriptAnnotation = "gene" )

Source link