gravatar for Bosberg

2 hours ago by

Germany, MPI

In the genomic ranges package for R, is there an option in setdiff that prevents the collapsing of adjacent ranges? for example, If I have the following:

gr1 = GRanges object 
  [1]     chr1      1-10      *
  [2]     chr1     11-20      *
  [3]     chr1     21-30      *

gr2 = GRanges object
  [1]     chr1     18-25      *

and I take setdiff:

What I get:

setdiff( gr1, gr2 )
  [1]     chr1      1-17      *      # <---- reduce happens automatically.
  [2]     chr1     26-30      *

The output stops at 17 and picks up again at 26 to avoid gr2 from 18-25, so at least that much is correct, but unfortunately, my output now has a single continuous range from 1 to 17 instead of 1-10, and a different GRange directly adjacent from 11-17. There's a reduce operation being done automatically that I want to suppress.

What I want:

setdiff( gr1, gr2, <Some_option_to_suppress_reduce> )
  [1]     chr1      1-10      *
  [2]     chr1     11-17      *
  [3]     chr1     26-30      *

I don’t want these first two regions to be reduced into one.

What I've tried:

The best solution I've come up with so far is to convert to a list and then back to GRange, like this:

unlist( GRangesList( lapply( 1:length(gr1), function(i) setdiff ( gr1[i], gr2) ) ))

..which does what I want but with all the converting between data types it's really slow and inefficient. Is there an option to turn off reduce directly (or some other more elegant solution)?


modified 1 hour ago

2 hours ago


Source link