bedtool closest


If i have a reference BED file and second file containg ncRNA BED file.

i want to compute the distance of ncRNAs to the nearest protein-coding genes and remove gene in ncRNA located within 250 bp from protein-coding genes on the same strand.

any one can help ?



If you can use the bedops kit, you can use closest-features to solve this:

  1. Separate reference and ncRNA files by strand and sort:
$ awk -v FS="t" -v OFS="t" '($6 == "+")' reference.bed | sort-bed - > reference.for.bed
$ awk -v FS="t" -v OFS="t" '($6 == "-")' reference.bed | sort-bed - > reference.rev.bed
$ awk -v FS="t" -v OFS="t" '($6 == "+")' ncRNA.bed | sort-bed - > ncRNA.for.bed
$ awk -v FS="t" -v OFS="t" '($6 == "-")' ncRNA.bed | sort-bed - > ncRNA.rev.bed
  1. Filter by positive distances for forward-strand elements. To confirm from your question, you want to remove elements that are within 250nt of the reference element's TSS (start position), so the distance reported from --dist --closest will be negative:
$ closest-features --dist --closest reference.for.bed ncRNA.for.bed | awk -v FS="|" -v OFS="t" '($3 < -250)' > filtered.for.bed
  1. Repeat for reverse-stranded elements. In this case, the threshold will be positive, because the TSS of reverse-stranded elements will be at the stop position:
$ closest-features --dist --closest reference.rev.bed ncRNA.rev.bed | awk -v FS="|" -v OFS="t" '($3 > 250)' > filtered.rev.bed
  1. Take the union of the filtered results:
$ bedops --everything filtered.for.bed filtered.rev.bed > filtered.bed


before adding your answer.

Traffic: 1403 users visited in the last hour

Source link