gravatar for bryce.thomas

3 hours ago by

So the RNAseq data is for a non-model organism. The transcriptome was assembled using Trinity. However, Trinity has labelled the genes with it's own madeup title (in bold).

TRINITY_DN41182_c0_g1_i1 len=209 path=[1:0-208] [-1, 1, -2]
ATGGTGAGAACTGCCCATGTGATGGAGACTCAGTATGGCCATCTGTTTGAAAAGGTCATA
GTCAACGACGACCTCTCGACCGCCTTCAGCGAGCTGCGGTTGGCACTAAAGAAAGTGGAG
ACGGAGACTCACTGGGTTCCAGTCAGCTGGACCCACTCCTGAGATCCTCACAGACTGTAA
AGGGAGAAAAGGGAAGGACTTTGACAAAA

TRINITY_DN41181_c0_g1_i1 len=207 path=[1:0-206] [-1, 1, -2]
TATGGACCCCCTCCTCCTCCCCCTGGCGAGTACGGCGGCCATGCTGAGTCTCCGGTTGTC
ATGGTGTACGGATTGGACCCCGTCAAGATGAACGCAGACCGTGTCTTCAACATCTTCTGT
CTCTATGGCAACGTAGAGCGGGTCAAGTTCATGAAGAGTAAGCCCGGAGCAGCCATGGTG
GAAATGGGAGACTGTTACGCGGTGGAT

Which means when you map the reads to the assembled reference you get

target_id length eff_length est_counts tpm

TRINITY_DN34124_c0_g1_i1 205 27.253 0 0

TRINITY_DN34120_c0_g1_i1 236 34.7816 15 14.2884

I need to use the sequence to look up gene ID's but I don't know how to do this. The closest genome I can find is with Ensembl DB for s.orbicularis, or A. percula but I don't know how to use these to convert the trinity output into something meaningful. I'm more comfortable using R, if possible but obviously beggars can't be choosers.



Source link