Good afternoon 🙂
This is more of a theory question. My general pipeline is BBDuk.sh -> STAR align -> salmon quant. I am using 3' RNA-seq (Lexogen QuantSeq FWD prep).
During BBDuk.sh, this does a poly a tail trim, as well as a adapter trim (all happy here).
I then align the trimmed reads to the reference genome using STAR (again, all happy here).
This is where some differences occur with Salmon versions.
Using salmon v0.10.0, i get expected outputs and quantified files.
/nethome/bdy8/programs/salmon-0.10.0_linux_x86_64/bin/salmon
quant
-t /nethome/bdy8/apal_genome/version3.0/Apalm_gffread_for_salmon.fasta
-l SF
-a /scratch/projects/transcriptomics/ben_young/DHE/tagseq/host/aligned/'"$PALPAL"'/'"$PALPAL"'_Aligned.toTranscriptome.out.bam
-o /scratch/projects/transcriptomics/ben_young/DHE/tagseq/host/salmon/'"${PALPAL}"'_salmon
However, using salmon 1.40 I experience thew following error message
/nethome/bdy8/programs/salmon-latest_linux_x86_64/bin/salmon quant
-t /nethome/bdy8/apal_genome/Apalm_gffread_for_salmon.fasta
-l SF
--noLengthCorrection
-a /scratch/projects/transcriptomics/ben_young/POR/tagseq/host/aligned_reads/bagnumber-apal-1009/bagnumber-apal-1009_Aligned.toTranscriptome.out.bam
-o /scratch/projects/transcriptomics/ben_young/POR/tagseq/host/test
SAM file says target evm.model.Sc0a5M3_402_HRSCAF_756.4.1.5f5b2bc4 has
length 536, but the FASTA file contains a sequence of length [538 or
537]
What I am thinking is happening is that this is resulting due to the poly a tail trimming in the BBDuk.sh stage. Apart from this I do not know what is happening and was wondering if anyone else has any ideas on this or a fix to ignore this mismatch of 1.
If any more information is needed please let me know and I will supply it.