# How to a determine and output if numbers in matrix are above a certain value?

How to a determine and output if numbers in matrix are above a certain value?

1

Hello,

I am working on a matrix with rows as 3000 individuals and columns as coverage depth of millions of sites

Usually I would use R to process this dataset. But it is too big that R cannot take it

For each value in this huge matrix, I need to know whether it is >=10 or <10. I would hope to return a matrix with rows as samples and columns as site but each cell with value of 1 if the depth >=10 and value of 0 if depth < 0. I wonder how I could do this?

Thank you very much!

• 62 views

You could use `awk`. Starting with an example matrix:

``````% echo -e '10t11t1t9t10n1t2t14t12t99' > matrix.txt
% cat matrix.txt
10  11  1   9   10
1   2   14  12  99
``````

Then you can threshold it like so:

``````% awk -v FS="t" -v OFS="t" -v THRESHOLD=10 '{ for (i=1; i<=NF; i++) { (\$i >= THRESHOLD) ? \$i = 1 : \$i = 0; } print \$0; }' matrix.txt
1   1   0   0   1
0   0   1   1   1
``````

To write it to a file:

``````% awk -v FS="t" -v OFS="t" -v THRESHOLD=10 '{ for (i=1; i<=NF; i++) { (\$i >= THRESHOLD) ? \$i = 1 : \$i = 0; } print \$0; }' matrix.txt > answer.txt
I'm assuming a typo in your question, because of the gap in conditions between the `0` and `10` cases. 