gravatar for iridha

2 hours ago by

I have chip-seq peaks based on NCBI genome which looks like the following:

seqnames ranges strand | Conc
<Rle> <IRanges> <Rle> | <numeric>
X0001.0806584 NC_000003.12 75668920-75669635 * | 11.10671
X0002.1092998 NC_000005.10 34190588-34192289 * | 8.45169
X0002.1092999 NC_000005.10 34190588-34192289 * | 8.45169
X0003.1726283 NC_000009.12 137101991-137103797 * | 8.30861

Although I used human_NCBI_GRCh38p12 for alignment when I get its annotation file in bioconductor the sequence names are based on chromosome name like the following:

TxDb.Hsapiens.UCSC.hg38.knownGene
ucsc.hg38.knownGene <- genes(TxDb.Hsapiens.UCSC.hg38.knownGene)

seqnames              ranges strand |     gene_id
               <Rle>           <IRanges>  <Rle> | <character>
          1    chr19   58345178-58362751      - |           1
         10     chr8   18391282-18401218      + |          10
        100    chr20   44619522-44652233      - |         100
       1000    chr18   27950966-28177130      - |        1000
  100009613    chr11   70072434-70075433      - |   100009613

the annotation is not working because of the difference in seqnames.

peaks_annotated<- annotatePeakInBatch(Peaks, AnnotationData=ucsc.hg38.knownGene)

using GCF_000001405.39_GRCh38.p13_genomic.gtf directly result in about 1 million gene for only 2000 peaks.

Any help that I can get this problem solved is highly and deeply appreciated.



Source link