Generating Positional List from VCF


I'd like to generate a 1-based position list from VCF file for all variants.
I believe that by VCF convention, the listed position in POS column specifies the same base for a single nucleotide substitution, but the preceding base for both insertions and deletions.

So, I thought that to specify the position of each variant as start - end - with a script you could take the position N provided by the VCF and convert as follows:

Insertion = N - N+1
SNP = N - N
Deletion = N+1 - N+length(REF)-1 

So for the following sample:

CHROM   POS             REF     ALT
11      66091886        T       TTTC
11      66108375        T       G
11      67180763        GTATT   G

It becomes:

CHROM   START           END 
11      66091886        66091887
11      66108375        66108375
11      67180764        67180767

Just wondering if I have gone about this correctly, and this method would in fact specify where in my alignment the variant itself occurs?



Source link