gravatar for nlehmann

2 hours ago by

France

Hello,

I am trying to load a GTF file processed through a de novo genome annotation reconstruction tool (StringTie or Scallop) and Gffcompare in GenomicFeatures. For that, I use txdb <- makeTxDbFromGFF(gffcmp.annotated.gtf, format="gtf"). I can load the file, except that it results in an empty TxDb object. Eg:

> transcripts(txdb)
GRanges object with 0 ranges and 2 metadata columns:
seqnames    ranges strand |     tx_id     tx_name
  <Rle> <IRanges>  <Rle> | <integer> <character>
  -------
seqinfo: no sequences

If we have a look at the GTF:

> head gffcmp.annotated.gtf
chr1    scallop transcript  26467   35838   .   +   .   transcript_id "gene.3.0.7"; gene_id "gene.3.0"; gene_name "CLC2DL5"; xloc "XLOC_000001"; cmp_ref "XM_025152731.1"; class_code "n"; tss_id "TSS1";
chr1    scallop exon    26467   27503   .   +   .   transcript_id "gene.3.0.7"; gene_id "gene.3.0"; exon_number "1";
chr1    scallop exon    32230   33187   .   +   .   transcript_id "gene.3.0.7"; gene_id "gene.3.0"; exon_number "2";
chr1    scallop exon    33287   35838   .   +   .   transcript_id "gene.3.0.7"; gene_id "gene.3.0"; exon_number "3";
chr1    scallop transcript  26467   35838   .   +   .   transcript_id "gene.3.0.6"; gene_id "gene.3.0"; gene_name "CLC2DL5"; xloc "XLOC_000001"; cmp_ref "XM_025152731.1"; class_code "j"; tss_id "TSS1";
chr1    scallop exon    26467   27503   .   +   .   transcript_id "gene.3.0.6"; gene_id "gene.3.0"; exon_number "1";
chr1    scallop exon    32230   35838   .   +   .   transcript_id "gene.3.0.6"; gene_id "gene.3.0"; exon_number "2";
chr1    scallop transcript  26467   35838   .   +   .   transcript_id "gene.3.0.12"; gene_id "gene.3.0"; gene_name "CLC2DL5"; xloc "XLOC_000001"; cmp_ref "XR_003076321.1"; class_code "c"; tss_id "TSS1";
chr1    scallop exon    26467   27503   .   +   .   transcript_id "gene.3.0.12"; gene_id "gene.3.0"; exon_number "1";
chr1    scallop exon    32230   32331   .   +   .   transcript_id "gene.3.0.12"; gene_id "gene.3.0"; exon_number "2";

The file gffcmp.annotated.gtf has 1,218,427 lines. I have no UTR regions, only "transcript" and "exon".

Can you see a reason why the TxDb object is empty ? What could I change ?

link

modified 2 hours ago

written
2 hours ago
by

nlehmann20



Source link