vcf2maf is the standard tool for converting maf and vcf files back and forth. However, I am encountering situations where the mutation notation between maf and vcf formats are not the same.

For example, I have a mutation in maf format that looks like this:

15      66729115        66729129        +       In_Frame_Del    DEL     GGAACCAGATCATAA      -

After converting to vcf format, the mutation is now written like this:

15       66729114        .       CGGAACCAGATCATAA        C

The POS chromosome position value has been changed, along with the ref and alt alleles, to match the requirements of the vcf spec.

What I need is a method for converting maf <-> vcf in such a way that I can retain the original coordinates in the output file. Thus, I want to have an output file that lists something like this, where both the old and new mutation notations are recorded;

15   66729114   CGGAACCAGATCATAA    C   15   66729115    66729129   GGAACCAGATCATAA      -

For context, I have maf files with extra metadata associated for each variant (which isnt preserved when converting to vcf), and I have converted them to vcf format in order to get more required metadata (in this case, output from bcftools isec which only takes vcf input and outputs a sites.txt file with presence/absence of each mutation in each sample). Upon trying a simple merge between the two datasets based on the Chrom, Pos, Ref, Alt values in each, I find that I am not able to correctly merge these entries where the mutation notation is change during the conversion. So I need an output file that has both notations recorded to assist in backfilling the new metadata from the vcf analysis into the correct maf variant entries.



Source link