How to a determine and output if numbers in matrix are above a certain value?

1

Hello,

I am working on a matrix with rows as 3000 individuals and columns as coverage depth of millions of sites

Usually I would use R to process this dataset. But it is too big that R cannot take it

For each value in this huge matrix, I need to know whether it is >=10 or <10. I would hope to return a matrix with rows as samples and columns as site but each cell with value of 1 if the depth >=10 and value of 0 if depth < 0. I wonder how I could do this?

Thank you very much!


Unix


depth


matrix


coverage


line


command

• 62 views

You could use awk. Starting with an example matrix:

% echo -e '10t11t1t9t10n1t2t14t12t99' > matrix.txt
% cat matrix.txt
10  11  1   9   10
1   2   14  12  99

Then you can threshold it like so:

% awk -v FS="t" -v OFS="t" -v THRESHOLD=10 '{ for (i=1; i<=NF; i++) { ($i >= THRESHOLD) ? $i = 1 : $i = 0; } print $0; }' matrix.txt
1   1   0   0   1
0   0   1   1   1

To write it to a file:

% awk -v FS="t" -v OFS="t" -v THRESHOLD=10 '{ for (i=1; i<=NF; i++) { ($i >= THRESHOLD) ? $i = 1 : $i = 0; } print $0; }' matrix.txt > answer.txt
% cat answer.txt
1   1   0   0   1
0   0   1   1   1

I'm assuming a typo in your question, because of the gap in conditions between the 0 and 10 cases.


Login
before adding your answer.

Traffic: 1667 users visited in the last hour



Source link