Both versions are available on the FTP site. GCF_000001405.25 is the RefSeq assembly accession corresponding to GRCh37.p13.

RefSNP VCF files for GRC (Genome Reference Consortium) human assembly
37 (GCF_000001405.25) and 38 (GCF_000001405.38). Files are compressed
by bgzip and with the tabix index.

Source: ftp.ncbi.nih.gov/snp/archive/b153/00readme.txt

dbSNP154 is coming, share a script for preprocessing

## 05/09/2021: 2020-05-26 13:48 -- dbSNP154
wget https://ftp.ncbi.nih.gov/snp/archive/b154/VCF/GCF_000001405.25.gz ./
wget https://ftp.ncbi.nih.gov/snp/archive/b154/VCF/GCF_000001405.25.gz.md5 ./        
wget https://ftp.ncbi.nih.gov/snp/archive/b154/VCF/GCF_000001405.25.gz.tbi ./     
wget https://ftp.ncbi.nih.gov/snp/archive/b154/VCF/GCF_000001405.25.gz.tbi.md5 ./   
wget https://ftp.ncbi.nih.gov/snp/archive/b154/VCF/GCF_000001405.38.gz ./         
wget https://ftp.ncbi.nih.gov/snp/archive/b154/VCF/GCF_000001405.38.gz.md5 ./      
wget https://ftp.ncbi.nih.gov/snp/archive/b154/VCF/GCF_000001405.38.gz.tbi ./    
wget https://ftp.ncbi.nih.gov/snp/archive/b154/VCF/GCF_000001405.38.gz.tbi.md5 ./  
wget https://raw.githubusercontent.com/Shicheng-Guo/AnnotationDatabase/master/GCF_000001405.25_GRCh37.p13_assembly_report.txt ./
wget https://raw.githubusercontent.com/Shicheng-Guo/AnnotationDatabase/master/GCF_000001405.38_GRCh38.p12_assembly_report.txt ./
awk -v RS="(r)?n" 'BEGIN { FS="t" } !/^#/ { if ($10 != "na") print $7,$10; else print $7,$5 }' GCF_000001405.25_GRCh37.p13_assembly_report.txt > dbSNP-to-UCSC-GRCh37.p13.map
awk -v RS="(r)?n" 'BEGIN { FS="t" } !/^#/ { if ($10 != "na") print $7,$10; else print $7,$5 }' GCF_000001405.38_GRCh38.p12_assembly_report.txt > dbSNP-to-UCSC-GRCh38.p12.map
#sed -i '{s/chrX/23/g}' dbSNP-to-UCSC-GRCh37.p13.map
#sed -i '{s/chrY/24/g}' dbSNP-to-UCSC-GRCh37.p13.map
#sed -i '{s/chrM/25/g}' dbSNP-to-UCSC-GRCh37.p13.map
#sed -i '{s/chr//g}' dbSNP-to-UCSC-GRCh37.p13.map
sbatch --job-name=dbsnp154 --output=dbsnp154.out ~/bin/sbatch.sh 'bcftools annotate --threads 48 --rename-chrs dbSNP-to-UCSC-GRCh37.p13.map GCF_000001405.25.gz -o dbSNP154.hg19.vcf.gz'
sbatch --job-name=hg38 --mem=24G --output=hg38 ~/bin/sbatch.sh 'bcftools annotate --threads 48 --rename-chrs dbSNP-to-UCSC-GRCh38.p12.map GCF_000001405.38.gz -o dbSNP154.hg38.vcf.gz'


Login
before adding your answer.

Traffic: 1461 users visited in the last hour



Source link