Read group: ID, PU definition and multi-lane (same sample)

0

Hi,

I'm looking for a confirmation about what I'm doing, if correct.

1) I found several definitions of ID and PU and now I'm going to use this:

An example: @A00155:140:HHTKFDSXX:1:1101:3423:1000 1:N:0:CAGTGACT+CGAGGCGT

PU1=A00155 ### instrument
PU2=140 ### run
FL=HHTKFDSXX ### flowcell
LN=1 ### lane
LB ### library ID

ID=${PU1}.${PU2}
PU=${FL}.${LN}.${PU2}

2) when I have multilane data I saw that LB is important for the MarkDuplicates step and ID/PU for BQSR.

  • For MarkDuplicates I have to give as input all bam files from the same LB (of the same sample), correct?

  • When I have two libraries for the same sample I perform MarkDuplicates for each library and then I give as input both files (outputs of MarkDuplciates) at BQSR, correct?

Many thanks for your time!


multilane


group_read


RG

• 19 views



Source link