gravatar for egill.richard

2 hours ago by

Hello everyone,

I am interested in detecting deletion events in sequencing data, more specifically PacBio data. I searched for a simple solution to detect and extract position of deletion in a SAM file but I could not find anything despite the apparent simplicity of the problem.

In input I have a SAM file that contains in particular :
1) the position of the first base of each read
2) the CIGAR where M stands for a Match, I stands for an insertion and D stands for a deletion

What I want to do is, in each read of my SAM file, getting the start and end position of the deletion.

Input :

pos    CIGAR
1000   200M200D300M

Output :

deletion_start      deletion_end   
1201                1400

I feel that it can be done with a few command lines in python but I am just learning this language. Once I have my table with the deletion I will be independent since I know R much better but for this step I would need your knowledge guys.
If there is a tool that does exactly this and that I missed it is even better!

Thank you very much.

Source link