gravatar for T_18

2 hours ago by

Dear all,

In short, my gene ID's of the gff file do not correspond with the protein headers while I do need this for an analysis. What can I do to make sure the protein headers are identical to the gene ID's in the GFF file?

I am doing a genome comparison of circa 40 species, for which I want to do a microsynteny analysis. For each species I need as input a bed file only containing the gene information and the protein seq file.

75% of the GFF files is not correct and the issue is not the same for all the files. Would it be possible to rename the gene ID's in the GFF file based on all e.g. CDS's given? And in case even the CDS's IDs are not correct I suppose the only way is to extract the genes based on the GFF gene coordinates and the full genome assembly (this is what I am trying to avoid).

The problem I am facing is that the gene IDs do not correspond with the protein IDs from the seq file:

GFF file example:

    LADJ01009471.1  Genbank gene    1   4718    .   +   .   ID=gene747;Name=RR48_00748;description=Ethanolaminephosphotransferase 1;end_range=4718%2C.;gbkey=Gene;gene_biotype=protein_coding;locus_tag=RR48_00748;partial=true;stable_id=RR48_00748;start_range=.%2C1
LADJ01009471.1  Genbank mRNA    1   4718    .   +   .   ID=rna747;Parent=gene747;end_range=4718%2C.;gbkey=mRNA;partial=true;product=Ethanolaminephosphotransferase 1;stable_id=KPJ20932.1;start_range=.%2C1;translation_stable_id=KPJ20932.1
LADJ01009471.1  Genbank CDS 1   85  .   +   0   Dbxref=InterPro:IPR000462,UniProtKB/Swiss-Prot:Q80TA1,NCBI_GP:KPJ20932.1;ID=cds747;Name=KPJ20932.1;Parent=rna747;gbkey=CDS;partial=true;product=Ethanolaminephosphotransferase 1;protein_id=KPJ20932.1
LADJ01009471.1  Genbank CDS 1689    1842    .   +   2   Dbxref=InterPro:IPR000462,UniProtKB/Swiss-Prot:Q80TA1,NCBI_GP:KPJ20932.1;ID=cds747;Name=KPJ20932.1;Parent=rna747;gbkey=CDS;partial=true;product=Ethanolaminephosphotransferase 1;protein_id=KPJ20932.1
LADJ01009471.1  Genbank CDS 2856    3110    .   +   1   Dbxref=InterPro:IPR000462,UniProtKB/Swiss-Prot:Q80TA1,NCBI_GP:KPJ20932.1;ID=cds747;Name=KPJ20932.1;Parent=rna747;gbkey=CDS;partial=true;product=Ethanolaminephosphotransferase 1;protein_id=KPJ20932.1
LADJ01009471.1  Genbank CDS 4502    4718    .   +   1   Dbxref=InterPro:IPR000462,UniProtKB/Swiss-Prot:Q80TA1,NCBI_GP:KPJ20932.1;ID=cds747;Name=KPJ20932.1;Parent=rna747;gbkey=CDS;partial=true;product=Ethanolaminephosphotransferase 1;protein_id=KPJ20932.1

Protein header corresponding to this part:

>KPJ20932.1 papilio_machaon_papma1_core_32_85_1 protein Ethanolaminephosphotransferase 1

link

written
2 hours ago
by

T_1840



Source link