Method 1:

  1. Download relevant annotation files from GENCODE. Human hg19 and Mouse mm10
  2. Fish out Ensembl ID's of the genes from each file awk -F """ '$1 ~ /chrM/ && $3 ~ /gene/ {print $2}' gencode.v37lift37.annotation.gtf (this is hg19 example)
  3. Use BioMart to retrieve sequence

Method 2:

  1. Download annotation file from NCBI. Human hg19 or Mouse mm10
  2. Get the sequence using EntrezDirect
    awk -F "t" '$1 ~ /NC_012920/ && $3 ~ /gene/ {print $1,$4,$5}' GRCh37_latest_genomic.gff | xargs -n 3 sh -c 'efetch -db nuccore -id "$0" -seq_start "$1" -seq_stop "$2" -format fasta' (human hg19)

  3. For mouse mm10
    awk -F "t" '$1 ~ /NC_005089/ && $3 ~ /gene/ {print $1,$4,$5}' GCF_000001635.26_GRCm38.p6_genomic.gtf | xargs -n 3 sh -c 'efetch -db nuccore -id "$0" -seq_start "$1" -seq_stop "$2" -format fasta'

There are different tools and repositories. GRCh37 (hg19) has been archived on Ensembl.

Here are some tips:

  • Make sure you have a good reason why you don't want to align your reads against an updated version.
  • Make sure you understand what "fasta" and "gtf/gff" files are.

Some tools you might want to read about are:

Good luck and I think finding data for mm10 should be straight forward now.


Login
before adding your answer.

Traffic: 1084 users visited in the last hour



Source link