I have a tab-delimited txt file that contains my data with many columns and lines. Here, to explain my problem I'm using a dummy file.
This file has some columns. I want to perform the following action serially;
-
Sort data file based on the value of columns A and B. This step may generate two sorted copies of my original data but, these files are not mandatory.
-
Then I want to extract the top five data from each sorted copy and generated two files (as for example; TopA, TopB).
-
You may notice, these files contain some common numbers in the first (Pos.) column. Here, in the example number 302, 941 and 699 are common in Pos. column of all files (TopA, TopB). Thus, my target is to extract only those data which contain a common number in Pos. column in all files and save them in result.txt file.
Would anyone please help me with a bash/perl/python code to get this result?
Thanks in advance.
Datafile
Pos. DNA %GC A B C
644 CGGAGGU 52.6 0.876 76.2 102.3
302 GGUACGG 31.6 0.883 83.6 100.9
1067 GCUUAGU 42.1 0.873 76.6 99.7
1191 GGAGCUG 42.1 0.872 75.3 99.3
105 GACACUG 52.6 0.84 68.1 98.6
941 CCGCAAU 42.1 0.879 76.8 98.2
961 GCGUUUG 36.8 0.861 78 98.2
699 CGACGAA 36.8 0.875 84.7 98.1
663 GGAUAUC 47.4 0.867 77.5 97.1
566 GCUUCGA 52.6 0.802 62.6 96.7
TopA
Pos. DNA %GC A B
302 GGUACGG 31.6 0.883 83.6
941 CCGCAAU 42.1 0.879 76.8
644 CGGAGGU 52.6 0.876 76.2
699 CGACGAA 36.8 0.875 84.7
1067 GCUUAGU 42.1 0.873 76.6
TopB
Pos. DNA %GC A B
699 CGACGAA 36.8 0.875 84.7
302 GGUACGG 31.6 0.883 83.6
961 GCGUUUG 36.8 0.861 78
663 GGAUAUC 47.4 0.867 77.5
941 CCGCAAU 42.1 0.879 76.8
Result
Pos. DNA %GC A B
302 GGUACGG 31.6 0.883 83.6
941 CCGCAAU 42.1 0.879 76.8
699 CGACGAA 36.8 0.875 84.7