gravatar for Alex Reynolds

1 hour ago by

Seattle, WA USA

Here is a general approach with BEDOPS bedmap --count, which generalizes to N internally-disjoint (and sort-bed-sorted) input files:

$ bedops --everything 1.bed 2.bed 3.bed ... N.bed 
    | bedmap --count --echo --delim 't' - 
    | uniq 
    | awk -v OVERLAPS=2 '$1 >= OVERLAPS' 
    | cut -f2- 
    > common.bed

The bedmap step uses, by default, overlap of one or more bases for inclusion. You can modify this threshold to be more stringent. The readthedocs site has details, or run bedmap --help for a shorter overview.

By changing the test in the awk statement, this approach can be modified to return other subsets of the input's power set, e.g., all elements common to exactly, less than, or greater than N inputs.

Once you have the elements which meet your overlap threshold, you can then process those elements to get their overlapping regions via a final operation:

$ bedops --partition common.bed | bedmap --count --echo --delim 't' - common.bed | awk '$1 >= 2' | cut -f2- > overlaps.bed

If your inputs are not internally disjoint — if some elements may overlap within any one of the N input files — you might instead apply some ID-based tricks I describe in my answer over here: A: Intersect multiple BED files

Source link