gravatar for Joshi

5 hours ago by

Australia

Hi - Would appreciate help with this one ..

I downloaded this particular ENCODE rnaseq dataset (BAM alignment). This a Long Read RNAseq sample.

  • The original file size for ENCFF653FOQ.bam is 300Mb
  • To view the RNASeq file in IGV, I first needed to index it
  • When I tried to index this using samtools index, it notified me that the BAM file wasn't sorted
  • After sorting, the size of ENCFF653FOQ.sorted.bam is 88Mb

I ran samtools flagstat on both the original and sorted bam files; and see no difference.

What is being lost or removed when sorting the Long Read RNASeq file? Is samtools the right tool for handling long read rna-seq data?

$ samtools flagstat ENCFF653FOQ.bam
647063 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
647063 + 0 mapped (100.00% : N/A)
0 + 0 paired in sequencing
0 + 0 read1
0 + 0 read2
0 + 0 properly paired (N/A : N/A)
0 + 0 with itself and mate mapped
0 + 0 singletons (N/A : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)

$ samtools flagstat ENCFF653FOQ.sorted.bam
647063 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
647063 + 0 mapped (100.00% : N/A)
0 + 0 paired in sequencing
0 + 0 read1
0 + 0 read2
0 + 0 properly paired (N/A : N/A)
0 + 0 with itself and mate mapped
0 + 0 singletons (N/A : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)

$ ls -l ENCFF653FOQ.bam ENCFF653FOQ.sorted.bam
-rw-r--r-- 1 287M Apr 28 14:09 ENCFF653FOQ.bam
-rw-r--r-- 1  84M Apr 28 19:10 ENCFF653FOQ.sorted.bam

$ samtools --version
samtools 1.9
Using htslib 1.9
Copyright (C) 2018 Genome Research Ltd.

link

modified 3 hours ago

by

genomax81k

written
5 hours ago
by

Joshi0



Source link