Converting a VCF with SNPs and indels to BED format


Does anyone know of tools or have custom  scripts (aside from "vcf2bed") that are able to convert between a VCF containing both indels and SNPs into BED format? The tricky part is creating the correct BED regions to capture the indel variants, and I haven't been able to find anything on the internet thus far.




The vcf2bed script will handle the last three issues:

  • Use --deletions to convert deletions to one set
  • Use --snvs to convert single-nucleotide variants to convert to a second set
  • Conversion of one VCF line will generate multiple BED lines if there are multiple alternate alleles

You can use bedops --everything to union the deletion and single-base variant sets:

$ vcf2bed --deletions < foo.vcf > foo_deletions.bed
$ vcf2bed --snvs < foo.vcf > foo_snvs.bed
$ bedops --everything foo_{deletions,snvs}.bed > foo.bed

Once you have your deletions and SNVs in one file, it's easy to deal with the first issue. The conversion script maps the REF and ALT fields of the VCF input to the sixth and seventh columns of the BED output. So you can filter out lines from the BED output where those columns are equivalent, using a simple awk statement:

$ awk '$6 != $7' foo.bed > foo_filtered.bed

Try the following code:

awk '! /#/' input.vcf | awk '{if(length($4) > length($5)) print $1"t"($2-1)"t"($2+length($4)-1); else print $1"t"($2-1)"t"($2+length($5)-1)}' > output.bed

This should work!

12 months ago by




before adding your answer.

Traffic: 2712 users visited in the last hour

Source link