I have a vcf file from GATK tool. The file has 48 population of a species with scaffolds (50 scaffolds) from different locations and host species.
What I need to do is to perform statistical tests and some analysis between populations.
As some tests ( Fst, Tajima’s D etc.) are mostly used for two populations, I need to make groups having populations.
Populations A,B,C belong to first location and populations D,E,F,G belong to second location.
Populations A,B,C belong to a host, and other populations belong to another host etc.
I would like to get advices for grouping populations (7 populations) into 2.
Is it okay to put data of populations (A,B,C) into one file and others (D,E,F,G) into another file and consider as two populations for statistical tests and further downstream analysis ( natural selection)?
Any advice for approach and tools would be appreciated.