Hi all,
I have a multifasta file containing a lot of sequences. All of them have a header as the following example:
>2624749465 radical S-adenosyl methionine domain-containing protein 2 [Selenomonas ruminatium S137 : Ga0066891_103]
I would like to keep only:
>2624749465_Selenomonas_ruminatium
As I'm still learning about bash scripts, I tried with the cut
command:
cut -d ' ' -f1 your_file.fa > new_file.fa
So now I have only the gene ID but not the organism. How could I change my command line in order to have both ? Or if you have other suggestions, please let me known!
Thank you in advance for your help
Have a great day