Weird characters in yeast reference gff3

0

I am looking at the yeast reference annotation (in gff3 format) downloaded from either SGD or Ensembl fungi. In both cases, the gff3 file appears to contain weird characters in the attributes field, which cause me a world of trouble downstream. Example:

chrXVI  SGD     gene    174343  174756  .       -       .       ID=YPL197C;Name=YPL197C;Ontology_term=GO:0003674,GO:0005575,GO:0008150;Note=Dubious%20open%20reading%20frame%3B%20unlikely%20to%20encode%20a%20functional%20protein%2C%20based%20on%20available%20experimental%20and%20comparative%20sequence%20data%3B%20partially%20overlaps%20the%20ribosomal%20gene%20RPB7B;display=Dubious%20open%20reading%20frame;dbxref=SGD:S000006118;orf_classification=Dubious

See the "%20" and "%3B" characters?
As far as I understand these are UTF-8 hex representations of certain characters, but why are they included this way? and how can I get rid of them?

To view the full gff file, download the genome release, extract the tar.gz , and look at the file saccharomyces_cerevisiae_R64-2-1_20150113.gff


SGD


gff3


yeast


gff

• 43 views



Source link