gravatar for r.tor

2 hours ago by

I want to split up a matrix called 'matrix' into chunks based on the values in the first column, 'GENE', and save each chunk as a separate .gz file. So that, there would be subsets of the matrix, each of which will have the lines corresponding to the only 3 GENEs, just not the last one as shown in the example below. The script should be prepared in Bash.

Input:

> matrix
GENE Individual Expr1 Expr2 Expr3
ENSG1 indv1 0.1 0.2 0.3
ENSG1 indv2 0.1 0.2 0.3
ENSG2 indv1 0.1 0.2 0.3
ENSG2 indv2 0.1 0.2 0.3
ENSG3 indv1 0.1 0.2 0.3
ENSG3 indv2 0.1 0.2 0.3
ENSG4 indv1 0.1 0.2 0.3
ENSG4 indv2 0.1 0.2 0.3
ENSG5 indv1 0.1 0.2 0.3
ENSG5 indv2 0.1 0.2 0.3
ENSG6 indv1 0.1 0.2 0.3
ENSG6 indv2 0.1 0.2 0.3
ENSG7 indv1 0.1 0.2 0.3
ENSG7 indv2 0.1 0.2 0.3
ENSG8 indv1 0.1 0.2 0.3
ENSG8 indv2 0.1 0.2 0.3
ENSG9 indv1 0.1 0.2 0.3
ENSG9 indv2 0.1 0.2 0.3
ENSG10 indv1 0.1 0.2 0.3
ENSG10 indv2 0.1 0.2 0.3

Outputs:

> matrix.chunk1
GENE Individual Expr1 Expr2 Expr3
ENSG1 indv1 0.1 0.2 0.3
ENSG1 indv2 0.1 0.2 0.3
ENSG2 indv1 0.1 0.2 0.3
ENSG2 indv2 0.1 0.2 0.3
ENSG3 indv1 0.1 0.2 0.3
ENSG3 indv2 0.1 0.2 0.3

> matrix.chunk2
GENE Individual Expr1 Expr2 Expr3
ENSG4 indv1 0.1 0.2 0.3
ENSG4 indv2 0.1 0.2 0.3
ENSG5 indv1 0.1 0.2 0.3
ENSG5 indv2 0.1 0.2 0.3
ENSG6 indv1 0.1 0.2 0.3
ENSG6 indv2 0.1 0.2 0.3

> matrix.chunk3
GENE Individual Expr1 Expr2 Expr3
ENSG7 indv1 0.1 0.2 0.3
ENSG7 indv2 0.1 0.2 0.3
ENSG8 indv1 0.1 0.2 0.3
ENSG8 indv2 0.1 0.2 0.3
ENSG9 indv1 0.1 0.2 0.3
ENSG9 indv2 0.1 0.2 0.3

> matrix.chunk4
GENE Individual Expr1 Expr2 Expr3
ENSG10 indv1 0.1 0.2 0.3
ENSG10 indv2 0.1 0.2 0.3

I would appreciate any suggestion.

link

written
2 hours ago
by

r.tor40



Source link