I have chip-seq peaks based on NCBI genome which looks like the following:
seqnames ranges strand | Conc
<Rle> <IRanges> <Rle> | <numeric>
X0001.0806584 NC_000003.12 75668920-75669635 * | 11.10671
X0002.1092998 NC_000005.10 34190588-34192289 * | 8.45169
X0002.1092999 NC_000005.10 34190588-34192289 * | 8.45169
X0003.1726283 NC_000009.12 137101991-137103797 * | 8.30861
Although I used human_NCBI_GRCh38p12 for alignment when I get its annotation file in bioconductor the sequence names are based on chromosome name like the following:
TxDb.Hsapiens.UCSC.hg38.knownGene ucsc.hg38.knownGene <- genes(TxDb.Hsapiens.UCSC.hg38.knownGene) seqnames ranges strand | gene_id <Rle> <IRanges> <Rle> | <character> 1 chr19 58345178-58362751 - | 1 10 chr8 18391282-18401218 + | 10 100 chr20 44619522-44652233 - | 100 1000 chr18 27950966-28177130 - | 1000 100009613 chr11 70072434-70075433 - | 100009613
the annotation is not working because of the difference in seqnames.
peaks_annotated<- annotatePeakInBatch(Peaks, AnnotationData=ucsc.hg38.knownGene)
using GCF_000001405.39_GRCh38.p13_genomic.gtf directly result in about 1 million gene for only 2000 peaks.
Any help that I can get this problem solved is highly and deeply appreciated.