gravatar for Audrey

2 hours ago by

France

Hi all,

I have a multifasta file containing a lot of sequences. All of them have a header as the following example:

>2624749465 radical S-adenosyl methionine domain-containing protein 2 [Selenomonas ruminatium S137 : Ga0066891_103]

I would like to keep only:

>2624749465_Selenomonas_ruminatium

As I'm still learning about bash scripts, I tried with the cut command:

cut -d ' ' -f1 your_file.fa > new_file.fa

So now I have only the gene ID but not the organism. How could I change my command line in order to have both ? Or if you have other suggestions, please let me known!

Thank you in advance for your help

Have a great day



Source link