gravatar for vkkodali

3 hours ago by

United States

Starting with a bunch of features in transcript coordinates along with an alignment of the transcript to the genome, is there a way to get alignments in genomic coordinates?

For example, I have a bed file with features of interest as follows:

tx1    10    25    feat1    100    +
tx1    45    95    feat2    100    +

And I have an alignment file, say, in BAM format with tx1 aligned to chr1. Note, tx1 is a multi-exon transcript and aligns to chr1 with intronic regions. What I am trying to get to is an output bed file with my features in chromosome coordinates that look something like:

chr1    1500    1525    feat1    100    +
chr1    1945    1995    feat2    100    +

Notes:

  • I am flexible with input, output and alignment formats.
  • I would prefer a solution that does not rely on any existing annotation as both tx1 and chr1 may be arbitrary sequences that are outside the scope of the standard databases.
  • tx1 is multi-exonic and the features can span two or more adjacent exons, so the output should have multiple rows for such split features



Source link