gravatar for Dunois

3 hours ago by

This is a sort-of-kind-of follow up to this post.

Say I've performed de novo transcriptome assembly with Trinity, and performed quantification with Salmon. Let's suppose Salmon also produced either Gibbs sampling or bootstraps alongside.

My first question is: how do I process and interpret the Gibbs/bootstrap data?

I understand that I can preprocess the data with ConvertBootstrapsToTSV.py, but then what? I've looked around, but there really doesn't seem to be any documentation (especially for the biologists/non-bioinformaticians) on how to make use of this data (at the time of reading that post, I incorrectly assumed the downstream analysis steps were well documented). The closest thing I could find was this issue on Salmon's GitHub, but that still leaves one hanging less than one quarter of the way.

I presume that what the Gibbs/bootstrap estimates represent are the (theoretical? estimated?) counts (that would have been?) generated from the set of reads at hand for a given set of transcripts over n iterations/draws (conditional upon some set of constraints?). So, if a transcript t is quantified using Salmon with (say) Gibbs sampling enabled (e.g., --numGibbsSamples 4) it would generate a set of counts like so: {20, 44, 22, 18}. I suppose then that the variance of this set is the uncertainty in inference that @Rob mentioned in their post. Is this correct?

My second question pertains to using these estimates to rule out transcript isoforms (as Trinity defines them) in the context of finding candidate proteins. Basically, I have two transcript isoforms that yield protein sequences that I've identified as homologs to a protein of interest. The problem is that both of them have the same e-value. My first idea was to check the read support for both of them, and potentially drop the one with lower read support. But it turned out that the one with the lower read support actually aligns slightly better with the protein of interest in comparison to the one with greater read support. So this brings me to the question at hand: If I have the aforementioned estimates for both these isoforms, and one of them has a high uncertainty in its counts, can I rule that one out as unlikely to be a "real" candidate even if it has greater read support?

Your inputs would be much appreciated.

link

modified 1 hour ago

written
3 hours ago
by

Dunois30



Source link