gravatar for arriyaz.nstu

2 hours ago by

I have a tab-delimited txt file that contains my data with many columns and lines. Here, to explain my problem I'm using a dummy file.
This file has some columns. I want to perform the following action serially;

  1. Sort data file based on the value of columns A and B. This step may generate two sorted copies of my original data but, these files are not mandatory.

  2. Then I want to extract the top five data from each sorted copy and generated two files (as for example; TopA, TopB).

  3. You may notice, these files contain some common numbers in the first (Pos.) column. Here, in the example number 302, 941 and 699 are common in Pos. column of all files (TopA, TopB). Thus, my target is to extract only those data which contain a common number in Pos. column in all files and save them in result.txt file.

Would anyone please help me with a bash/perl/python code to get this result?
Thanks in advance.

Datafile

Pos.    DNA %GC A   B   C
644 CGGAGGU 52.6    0.876   76.2    102.3
302 GGUACGG 31.6    0.883   83.6    100.9
1067    GCUUAGU 42.1    0.873   76.6    99.7
1191    GGAGCUG 42.1    0.872   75.3    99.3
105 GACACUG 52.6    0.84    68.1    98.6
941 CCGCAAU 42.1    0.879   76.8    98.2
961 GCGUUUG 36.8    0.861   78  98.2
699 CGACGAA 36.8    0.875   84.7    98.1
663 GGAUAUC 47.4    0.867   77.5    97.1
566 GCUUCGA 52.6    0.802   62.6    96.7

TopA

Pos.    DNA %GC A   B
302 GGUACGG 31.6    0.883   83.6
941 CCGCAAU 42.1    0.879   76.8
644 CGGAGGU 52.6    0.876   76.2
699 CGACGAA 36.8    0.875   84.7
1067    GCUUAGU 42.1    0.873   76.6

TopB

Pos.    DNA %GC A   B
699 CGACGAA 36.8    0.875   84.7
302 GGUACGG 31.6    0.883   83.6
961 GCGUUUG 36.8    0.861   78
663 GGAUAUC 47.4    0.867   77.5
941 CCGCAAU 42.1    0.879   76.8

Result

Pos.    DNA %GC A   B
302 GGUACGG 31.6    0.883   83.6
941 CCGCAAU 42.1    0.879   76.8
699 CGACGAA 36.8    0.875   84.7



Source link