gravatar for xiaoyonf

3 hours ago by

Baylor College of Medicine, Houston, Texas, USA

Hi, I have a 1000 txt files with two columns: the gene symbol column, and the mutation status column. I want to join all of these files into one file, which will contain first gene symbol column and the following 1000 sample columns of mutation status. For example, I want to join the following two input files:

Gene Sample1
A        yes
B        yes
D        yes

Gene Sample2
B         yes
C         yes
E         yes

into the output file

Gene   Sample1    Sample2
A          yes         NA
B          yes         yes
C           NA         yes
D           yes        NA
E           NA         yes

I know the full_join solution in R using the dplyr package, but it need to read all the files into R. Does anyone has the simple solution in Unix to do this?

Thanks a lot!

Source link