Bedtools Window: How to Identify features near genes, but where window size is only comprised of host sequence (excluding features)?


I am trying to annotate some genomic features that are found within 20kb of host genes. I have a bed file containing the genes, and another with the features. So far, I have been able to identify the overlap between features and genes using bedtools window:

bedtools window -w 20000 -a genes.bed -b features.bed

However, I am now exploring how these features have altered the landscape around genes. To do this, I want to identify all the features found within 20KB of HOST sequence. I.e if there was 5000bp of features within the 20kb window, I would like the window to be 25kb, where it then needs to keep expanding until 20kb of host sequence has been considered, excluding any sequence contributed by features.

I had thought about doing this iteratively using bedtools window, where I would start with 20kb, count the features bases in each flank, then add that many to each locus. But I have hit issues with downstream flanks of one gene hitting upstream of another, and also hitting gene loci.

I am not really sure where to start, or what the best solution for this might be. I would be really grateful for any pointers or solutions that anyone might come up with!







Source link