gravatar for nirkoty

2 hours ago by

Hi,

I am using Stringtie v2.1.1 on a single bam file. I end up with a gff file, but it looks like some transcripts are duplicated, for example:

chr1    StringTie   transcript  729898  732218  1000    -   .   gene_id "STRG.28"; transcript_id "STRG.28.1"; cov "9.134856"; FPKM "1.605420"; TPM "2.896626";
chr1    StringTie   exon    729898  729955  1000    -   .   gene_id "STRG.28"; transcript_id "STRG.28.1"; exon_number "1"; cov "5.454021";
chr1    StringTie   exon    732017  732218  1000    -   .   gene_id "STRG.28"; transcript_id "STRG.28.1"; exon_number "2"; cov "10.191729";
chr1    StringTie   transcript  729898  732218  1000    -   .   gene_id "STRG.28"; transcript_id "STRG.28.2"; cov "3.270196"; FPKM "0.574726"; TPM "1.036966";
chr1    StringTie   exon    729898  729955  1000    -   .   gene_id "STRG.28"; transcript_id "STRG.28.2"; exon_number "1"; cov "1.947918";
chr1    StringTie   exon    732013  732218  1000    -   .   gene_id "STRG.28"; transcript_id "STRG.28.2"; exon_number "2"; cov "3.642488";

In this example, I have 2 transcripts, starting and ending at the same position. They also have the same exons, except that in one case, the second exon start at position 732017 while on the other, it starts at position 732013.

If you consider another case,

chr1    StringTie   transcript  13483   29654   1000    -   .   gene_id "STRG.1"; transcript_id "STRG.1.1"; cov "27.115095"; FPKM "4.765386"; TPM "8.598089";
chr1    StringTie   exon    13483   15038   1000    -   .   gene_id "STRG.1"; transcript_id "STRG.1.1"; exon_number "1"; cov "23.612379";
chr1    StringTie   exon    15796   15947   1000    -   .   gene_id "STRG.1"; transcript_id "STRG.1.1"; exon_number "2"; cov "18.194462";
chr1    StringTie   exon    16607   16765   1000    -   .   gene_id "STRG.1"; transcript_id "STRG.1.1"; exon_number "3"; cov "13.165168";
chr1    StringTie   exon    16858   17055   1000    -   .   gene_id "STRG.1"; transcript_id "STRG.1.1"; exon_number "4"; cov "40.344353";
chr1    StringTie   exon    17233   17368   1000    -   .   gene_id "STRG.1"; transcript_id "STRG.1.1"; exon_number "5"; cov "52.639740";
chr1    StringTie   exon    17606   17742   1000    -   .   gene_id "STRG.1"; transcript_id "STRG.1.1"; exon_number "6"; cov "49.598957";
chr1    StringTie   exon    17915   18061   1000    -   .   gene_id "STRG.1"; transcript_id "STRG.1.1"; exon_number "7"; cov "45.024239";
chr1    StringTie   exon    18268   18366   1000    -   .   gene_id "STRG.1"; transcript_id "STRG.1.1"; exon_number "8"; cov "47.735268";
chr1    StringTie   exon    24738   24891   1000    -   .   gene_id "STRG.1"; transcript_id "STRG.1.1"; exon_number "9"; cov "16.246500";
chr1    StringTie   exon    29534   29654   1000    -   .   gene_id "STRG.1"; transcript_id "STRG.1.1"; exon_number "10"; cov "1.105868";
chr1    StringTie   transcript  13483   29654   1000    -   .   gene_id "STRG.1"; transcript_id "STRG.1.2"; cov "4.613946"; FPKM "0.810885"; TPM "1.463064";
chr1    StringTie   exon    13483   15038   1000    -   .   gene_id "STRG.1"; transcript_id "STRG.1.2"; exon_number "1"; cov "2.968100";
chr1    StringTie   exon    15796   15947   1000    -   .   gene_id "STRG.1"; transcript_id "STRG.1.2"; exon_number "2"; cov "2.287062";
chr1    StringTie   exon    16607   16765   1000    -   .   gene_id "STRG.1"; transcript_id "STRG.1.2"; exon_number "3"; cov "1.654875";
chr1    StringTie   exon    16858   17055   1000    -   .   gene_id "STRG.1"; transcript_id "STRG.1.2"; exon_number "4"; cov "5.071326";
chr1    StringTie   exon    17233   17368   1000    -   .   gene_id "STRG.1"; transcript_id "STRG.1.2"; exon_number "5"; cov "6.616868";
chr1    StringTie   exon    17606   17742   1000    -   .   gene_id "STRG.1"; transcript_id "STRG.1.2"; exon_number "6"; cov "6.234639";
chr1    StringTie   exon    17915   18061   1000    -   .   gene_id "STRG.1"; transcript_id "STRG.1.2"; exon_number "7"; cov "5.659593";
chr1    StringTie   exon    18268   18369   1000    -   .   gene_id "STRG.1"; transcript_id "STRG.1.2"; exon_number "8"; cov "6.363258";
chr1    StringTie   exon    18913   24891   1000    -   .   gene_id "STRG.1"; transcript_id "STRG.1.2"; exon_number "9"; cov "5.117283";
chr1    StringTie   exon    29534   29654   1000    -   .   gene_id "STRG.1"; transcript_id "STRG.1.2"; exon_number "10"; cov "0.139009";

2 transcripts, almost the same, except for exon 9 which starts at position 24738 in one case and 18913 in the other, although they end at the same position.

What should I do in this case, consider them as a single isoform and add the TPM? Keep them as separate (but then what is the reason behind this), or simply remove on of them.

This is on a human sample, assembled using hg38.

Thanks in advance for your help



Source link