gravatar for fullmooninu

4 hours ago by

I have a fasta file assembly and combining it with the raw reads we produced a .bam file which I converted to .sam .

The .sam information lines look like this:

A00321:42:HLLVYDSXX:2:2302:6153:3505    99      NODE_1_length_3415511_cov_137.721502    16      60      128M    =       607     742     CGATTAGTCCGGCCAAATCGCCGTCGAGCGCAATGAACATAACGGTCTTGCCCTCAGCGCGCAGCGCATCGGCCTTGGCGTCGATTGTGGAGTGCTCGACGCCCATGATGTCCATCATAGCACCATTG        FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF        RX:Z:TTGAGGGTATAGTAGT   QX:Z:FFFFFFFFFFFFFFFF   TR:Z:GACACCG    TQ:Z:FFFFFFF    BC:Z:AGTTGCAG   QT:Z:FFFFFFFF   XS:i:-10        AS:i:0  XM:Z:0  AM:Z:0  XT:i:1  RG:Z:over_1kb:LibraryNotSpecified:1:unknown_fc:0        OM:i:60

Separated by mandatory fields it would be something like this:

QNAME: A00321:42:HLLVYDSXX:1:1644:2248:3881
FLAG: 99
RNAME: NODE_1_length_3415511_cov_137.721502
POS: 1
MAPQ: 60
CIGAR: 1S127M
RNEXT: =
PNEXT: 536
TLEN: 386
SEQ:  ATCGGGTCTGACACCGCGATTAGTCCGGCCAAATCGCCGTCGAGCGCAATGAACATAACGGTCTTGCCCTCAGCGCGCAGCGCATCGGCCTTGGCGTCGATTGTGGAGTGCTCGACGCCCATGATGTC
QUAL: FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF

I'm actually interested in the meta data. I want to know how the RX: and BC: fields are distributed across the scaffolds in the original assembly.

I imagined the .sam file already contains the information about the assembly used to produce it. If I'm wrong, I'm sorry and please correct me, I'm just assuming.

What I want to do is, for each read in the .sam file, I find out its position in the assembled scaffold, and I record, Read_ID,Scaffold_ID,Read_Position_Inside_Scaffold,RX,BC

Then I want to use that database to analyse the distribution of RX and BC inside each scaffold.

That's what I want.

Ultimately what I'm trying to do is evaluate the quality of my assemblies based on the Barcode distribution.

I'm good at programming and parsing, I'm just having trouble figuring out, where, inside the .sam file, can I find the scaffold and scaffold position of each read.



Source link