gravatar for Maciek_Konopinski

2 hours ago by

Poland/Kraków/Institute of Nature Conservation

Dear All,
I have been struggling with PopGenome for a while and I ran out of ideas. I have data in VCF format with SNPs from several fragments of a few genes, a reference fasta with the same names as in VCF, and GTF like:

1 VCF CDS 68 124 . + 2 gene_id "Gene.100265";
2 VCF CDS 126 405 . + 1 gene_id "Gene.100265";
3 VCF CDS 447 820 . + 1 gene_id "Gene.100265";
4 VCF CDS 864 1078 . + 1 gene_id "Gene.100265";

The genes are sequenced in a few hundred individuals. Each gene was sequenced in a few fragments and the fragments are the same for all individuals.
I do not know to make PopGenome include information on coding sequence. I used:
PGfile <- PopGenome::readData("./vcf/",gffpath = "./gtf/", format = "VCF", include.unknown = TRUE)
PGfile <- set.synnonsyn(PGfile, ref.chr=paste0("./FASTA/references_uncoded_246_.fasta"))

[email protected]@CodingSNPS are all TRUE, but [email protected]@ExonSNPS are all FALSE. For some reason I get only a fraction of information from [email protected]@codons and [email protected]@n.nucleotides which anyway is NULL.

I guess this might be a problem about gtf file format but found no clue on how it should be formatted for PopGenome. I used "Gene", "Exon" and "CDS" (as above) for the third column, but nothing has changed.

Does anyone have idea what I did wrong?


Maciek Konopiński

The session info is as follows (all necessary packages are loaded):

R version 4.0.3 (2020-10-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.2 LTS


Source link