gravatar for waqaskhokhar999

2 hours ago by

I have a large tab-delimited file and a part of it is like:

25      M   X   A   A   X   S
25_a    M   K   A   A   R   S
25_b    M   A   A   A   V   S
31      M   A   A   A   V   S
31_a    M   A   A   A   V   S
31_b    M   A   A   A   V   S

I am trying to play with three rows at a time, the first row contains a reference sequence (actual sequence) whereas the next two rows reflect its variants. I am trying to do two things:

First thing is that from the first row (reference line (25)), I am trying to identify (match) a character (X) and trying to only keep the corresponding characters in the bottom two rows (25_a, 25_b) to get something like shown below,

25      M   X   A   A   X   S
25_a        K           R   
25_b        A           V

Secondly, If there is no (X) in the reference (31) line, then remove the corresponding two rows (31_a, 31_b) to get something like this:

31      M   A   A   A   V   S

And a final output should be like

25      M   X   A   A   X   S
25_a        K           R   
25_b        A           V   
31      M   A   A   A   V   S

I have tried to use sed command which allowed me to remove data after X character within same row but I am struggling to get the desired output. I have also posted the question here but they closed my question because i was not able to explain well. Any help will be highly appreciated



Source link