I'm working principally on A.thaliana, and would like to conduct GO analysis via r package GOseq. Cuz the local database does not support AT, relevant GO annotation should be introduced. The GO annotation for AT has been downloaded from gene ontology website, whose maintainer is shown to be TAIR, problem occur when I use readGAF function in r, and it simply returned error as below:
Error in readGFF("~/Downloads/tair.gaf") : reading GFF file: line 1 has less than 8 tab-separated columns
Then I checked the content of the file, found that first few lines are irrelevant infos, like
!gaf-version: 2.1 ! !Generated by GO Central ! !Date Generated by GOC: 2020-09-10 ! !Header from source association file: !================================= ! !Generated by GO Central ! !Date Generated by GOC: 2020-09-10 ! !Header from tair source association file: !================================= !Project_name: The Arabidopsis Information Resource (TAIR) !URL: http://www.arabidopsis.org !Contact Email: [email protected] !Last Updated: 2020-07-01 !================================= ! !Header copied from paint_tair_valid.gaf !================================= !Created on Tue Sep 1 09:52:00 2020. !PANTHER version: v.15.0. !GO version: 2020-08-10. ! !================================= ! !Documentation about this header can be found here: https://github.com/geneontology/go-site/blob/master/docs/gaf_validation.md ! TAIR locus:2008970 AT1G11880 GO:0000009 TAIR:AnalysisReference:501756966 IEA InterPro:IPR007315 F AT1G11880 AT1G11880|F12F1.28|F12F1_28 protein taxon:3702 20200618 InterPro TAIR:locus:2008970
so that I deleted those, and retried readGAF, but got another error here:
Error in readGFF("~/Downloads/tair.gaf") : reading GFF file: line 1 has more than 9 tab-separated columns
basically the format that all infos in seems not fit...? did I download the wrong file or should I use other function in R to solve it?
Welcome any ideas, many thanks in advance.