gravatar for Maxime

2 hours ago by


I need to do some modeling at protein level in the context of ASD to analyse variants. I started from a whole exome sequencing dataset and created consensus sequences for each gene with samtools/bcftools by applying an already made VCF file. Those sequences are the sequences from the reference genome with the mutations from the VCF file as I understand it.

I now need to be able to go to the protein sequence from those sequences and I don't know how.

I retrieved the cDNA and coding sequences for all the genes from BioMart and thought to align them on the consensus sequences, to see where it would align, to identify introns but wasn't able to retrieve any significant information.

Any help would be appreciated, thank you.

TL;DR : I have VCF files with variants I need to analyse at protein level by modelling, how would I be able to get the protein sequence with the variants from VCf.


